Sie sind auf Seite 1von 52

Lab Manual

Final Year Semester-V


Information Technology Engineering
Subject: OLAP Laboratory

Odd Semester

1
Institutional Vision, Mission and Quality Policy

Our Vision
To foster and permeate higher and quality education with value added engineering, technology
programs, providing all facilities in terms of technology and platforms for all round development with
societal awareness and nurture the youth with international competencies and exemplary level of
employability even under highly competitive environment so that they are innovative adaptable and
capable of handling problems faced by our country and world at large.

RAIT’s firm belief in new form of engineering education that lays equal stress on academics and
leadership building extracurricular skills has been a major contribution to the success of RAIT as one
of the most reputed institution of higher learning. The challenges faced by our country and world in
the 21 Century needs a whole new range of thought and action leaders, which a conventional
educational system in engineering disciplines are ill equipped to produce. Our reputation in providing
good engineering education with additional life skills ensure that high grade and highly motivated
students join us. Our laboratories and practical sessions reflect the latest that is being followed in the
Industry. The project works and summer projects make our students adept at handling the real life
problems and be Industry ready. Our students are well placed in the Industry and their performance
makes reputed companies visit us with renewed demands and vigour.

Our Mission
The Institution is committed to mobilize the resources and equip itself with men and materials of
excellence thereby ensuring that the Institution becomes pivotal center of service to Industry,
academia, and society with the latest technology. RAIT engages different platforms such as
technology enhancing Student Technical Societies, Cultural platforms, Sports excellence centers,
Entrepreneurial Development Center and Societal Interaction Cell. To develop the college to become
an autonomous Institution & deemed university at the earliest with facilities for advanced research
and development programs on par with international standards. To invite international and reputed
national Institutions and Universities to collaborate with our institution on the issues of common
interest of teaching and learning sophistication.

RAIT’s Mission is to produce engineering and technology professionals who are innovative and
inspiring thought leaders, adept at solving problems faced by our nation and world by providing
quality education.

The Institute is working closely with all stake holders like industry, academia to foster knowledge
generation, acquisition, dissemination using best available resources to address the great challenges
being faced by our country and World. RAIT is fully dedicated to provide its students skills that make
them leaders and solution providers and are Industry ready when they graduate from the Institution.

2
We at RAIT assure our main stakeholders of students 100% quality for the programmes we deliver.
This quality assurance stems from the teaching and learning processes we have at work at our campus
and the teachers who are handpicked from reputed institutions IIT/NIT/MU, etc. and they inspire the
students to be innovative in thinking and practical in approach. We have installed internal procedures
to better skills set of instructors by sending them to training courses, workshops, seminars and
conferences. We have also a full fledged course curriculum and deliveries planned in advance for a
structured semester long programme. We have well developed feedback system employers, alumni,
students and parents from to fine tune Learning and Teaching processes. These tools help us to ensure
same quality of teaching independent of any individual instructor. Each classroom is equipped with
Internet and other digital learning resources.

The effective learning process in the campus comprises a clean and stimulating classroom
environment and availability of lecture notes and digital resources prepared by instructor from the
comfort of home. In addition student is provided with good number of assignments that would trigger
his thinking process. The testing process involves an objective test paper that would gauge the
understanding of concepts by the students. The quality assurance process also ensures that the
learning process is effective. The summer internships and project work based training ensure learning
process to include practical and industry relevant aspects. Various technical events, seminars and
conferences make the student learning complete.

Our Quality Policy

Our Quality Policy

It is our earnest endeavour to produce high quality engineering professionals who are
innovatve and inspiring, thought and action leaders, competent to solve problems
faced by society, naton and world at large by striving towards very high standards in
3
learning, teaching and training methodologies.
Our Motto: If it is not of quality, it is NOT RAIT!
President, RAES

Departmental Vision, Mission

Mission
 The mission of the IT department is to prepare students for overall development including
employability, entrepreneurship and the ability to apply the technology to real life problems
by educating them in the fundamental concepts, technical skills/programming skills, depth of
knowledge and development of understanding in the field of Information Technology.

 To develop entrepreneurs, leaders and researchers with exemplary level of employability even
under highly competitive environments with high ethical, social and moral values.

Vision
 To pervade higher and quality education with value added engineering, technology programs
to deliver the IT graduates with knowledge, skills, tools and competencies necessary to
understand and apply the technical knowledge and to become competent to practice
engineering professionally and ethically in tomorrow’s global environment.

 To contribute to the overall development by imparting moral, social and ethical values.

4
Index

Sr. No. Contents Page No.


1. List of Experiments 6
2. Experiment Plan and Course Outcomes 7
3. Study and Evaluation Scheme 8
4. Experiment No. 1 9
5. Experiment No. 2 13
6. Experiment No. 3 15
7. Experiment No. 4 19
8. Experiment No. 5 22
9. Experiment No. 6 29
10. Experiment No. 7 34
11. Experiment No. 8 45
12. Experiment No. 9 73
13. Experiment No. 10 77

List of Experiments-OLAP lab


Sr. No. Experiments Name
1 Implementation of Query Optimizer: Simulation of Query optimizer

5
with a query tool (ever SQL,SQL server-runtress)
Query Evaluation and path expressions

a. Translating SQL queries to Relational Algebra and Query


2 Tree

b. Query Evaluation

3 Implementation of Two Phase Locking protocol.

4 Implementation of Timestamp based protocol.

5 Implementation of Log based Recovery mechanism.


Case Study on distributed database for a real life application and
6 simulation of Recovery methods.
Advanced Database Models Case study based assignments for
7 Temporal, Mobile or Spatial databases
Construction of star schema and Snowflake schema for company
8 database.
OLAP Exercise a) Construction of Cubes b) OLAP Operations,
9 OLAP Queries
Case Study on issues and usage of modern tools and technologies in
10 advance databases.

Experiment Plan & Course Outcome


Lab Outcomes:
LO1 Implement simple query optimizers and design alternate efficient paths for query
execution.

LO2 Simulate the working of concurrency protocols, recovery mechanisms in a database

6
LO3 Design applications using advanced models like mobile, spatial databases.

LO4 Implement a distributed database and understand its query processing and
transaction processing mechanisms

LO5 Build a data warehouse

LO6 Analyze data using OLAP operations so as to take strategic decisions.

Module Week Course


Experiments Name
No. No. Outcome
Implementation of Query Optimizer : Simulation of Query
1 W1 Optimizer with a query tool (ever SQL,SQL server- LO1
runtress)
Query Evaluation and path expressions
A) Translating SQL queries to Relational Algebra and
2 W2 LO1
Query Tree
B) Query Evaluation
3 W3 Implementation of Two Phase Locking protocol. LO2
4 W4 Implementation of Timestamp based protocol. LO2
5 W5 Implementation of Log based Recovery mechanism. LO2
Case Study on distributed database for a real life
6 W6 LO4
application and simulation of Recovery methods.
Advanced Database Models Case study based assignments
7 W7 for Temporal, Mobile or Spatial databases LO3

Construction of star schema and Snowflake schema for


8 W8 company database. LO6

OLAP Exercise a) Construction of Cubes b) OLAP


9 W9 LO5
Operations, OLAP Queries
Case Study on issues and usage of modern tools and
10 LO3
W10 technologies in advance databases.
Study and Evaluation Scheme

Course Course Examination Scheme


Code Name Theory Marks Term Oral Total
Internal assessment End Work
Test Test Ave. Of Sem.
1 2 Test 1 and Exam
Test 2

7
ITL503 OLAP LAB -- -- -- -- 25 25 50

Term Work:

1. Term work assessment must be based on the overall performance of the student with
every experiment graded from time to time. The grades should be converted into
marks as per the Credit and Grading System manual and should be added and
averaged.
2. The final certification and acceptance of term work ensures satisfactory performance
of laboratory work and minimum passing marks in term work.

Practical & Oral:

1. Oral exam will be based on the entire syllabus of Communication Engineering


Laboratory-616A cloud computing lab

8
OLAP Laboratory

Experiment No. : 1

Implementation of query optimizer:


Simulation of Query optimizer tool

Experiment No. 1
Aim: Implementation of Query Optimizer

Simulation of Query Optimizer with a query tool (ever SQL,SQL server-runtress)

What will you learn by performing this experiment?

9
Students will be able to understand how query optimization is done. This will help them to
write efficient queries.

Hardware & Software Required: PC Desktop, Query Optimizer Tool (EverSQL)

Theory:

A. How the Query Optimizer Works

At the core of the SQL Server Database Engine are two major components: the Storage
Engine and the Query Processor, also called the Relational Engine. The Storage Engine is
responsible for reading data between the disk and memory in a manner that optimizes
concurrency while maintaining data integrity. The Query Processor, as the name suggests,
accepts all queries submitted to SQL Server, devises a plan for their optimal execution, and
then executes the plan and delivers the required results.

Queries are submitted to SQL Server using the SQL language (or T-SQL, the Microsoft SQL
Server extension to SQL). Since SQL is a high-level declarative language, it only defines
what data to get from the database, not the steps required to retrieve that data, or any of the
algorithms for processing the request. Thus, for each query it receives, the first job of the
query processor is to devise a plan, as quickly as possible, which describes the best possible
way to execute said query (or, at the very least, an efficient way). Its second job is to execute
the query according to that plan.

Each of these tasks is delegated to a separate component within the query processor;
the Query Optimizer devises the plan and then passes it along to the Execution Engine,
which will actually execute the plan and get the results from the database.

10
B. Understanding Execution Plans

Now that you’ve found some statements that are slow, it’s time for the fun to begin.
1) EXPLAIN

The EXPLAIN command is by far the must have when it comes to tuning queries. It tells you
what is really going on. To use it, simply prepend your statement with EXPLAIN and run it.
PostgreSQL will show you the execution plan it used.

When using EXPLAIN for tuning, I recommend always using the ANALYZE option (EXPLAIN
ANALYZE) as it gives you more accurate results. The ANALYZE option actually executes the
statement (rather than just estimating it) and then explains it.

Example:

EXPLAIN PLAN FOR

SELECT last_name FROM employees;

EXPLAIN PLAN

SET STATEMENT_ID = 'st1' FOR

SELECT last_name FROM employees;

To check the result of EXPLAIN PLAN:

SELECT PLAN_TABLE_OUTPUT FROM TABLE(DBMS_XPLAN.DISPLAY());

Conclusion and Discussion:

Thus we have learnt about query optimization tools.

QUIZ / Viva Questions:

 What are different steps in query optimization


 What is query plan?
 What is execution engine?

References:

1. Elmasri and Navathe, “Fundamentals of Database Systems”, 6th Edition, PEARSON


Education.
2. Korth, Slberchatz,Sudarshan, :”Database System Concepts”, 6th Edition, McGraw –
Hill
3. C. J. Date, A. Kannan, S. Swamynathan “An Introduction To Database Systems”, 8th
Edition Pearson Education.

11
4. Raghu Ramakrishnan and Johannes Gehrke, “Database Management Systems” 3rd
Edition - McGraw Hill edition.

OLAP Laboratory

Experiment No. : 2
12
Query Evaluation and Path Expression:

a) Translation of SQL query into relational


algebra and query tree

b) Query Tree

Experiment No: 2
Aim: Query Evaluation and path expressions

a. Translating SQL queries to Relational Algebra and Query Tree

b. Determine Optimal Query Expression

What will you learn by performing this experiment?

Students will be able to understand how query optimization is done. This will help them
to write efficient queries.

Hardware & Software Required: PC Desktop, Query Optimizer Tool (EverSQL)

Theory:

13
Translating an arbitrary SQL query into a logical query plan (i.e., a relational algebra
expression) is a complex task
Consider a general SELECT-FROM-WHERE statement of the form

SELECT Select-list
FROM R1, ..., R2 T2, ...
WHERE
Where-condition

When the statement does not use subqueries in its where-condition, we can easily translate it
into the relational algebra as follows:

An alias R2 T2 in the FROM-clause corresponds to a renaming _T2(R2).


2. It is possible that there is no WHERE clause. In that case, it is of course unnecessary to
include the selection _ in the relational algebra expression.
3. If we omit the projection (_) we obtain the translation of the following special case:

SELECT *
FROM R1, . . . , R2 T2, . . .
WHERE Where-condition

Query Tree:

A query tree is a tree data structure representing a relational algebra expression. The tables
of the query are represented as leaf nodes. ... The node is then replaced by the result table.
This process continues for all internal nodes until the root node is executed and replaced by
the result table.

Consider the following schema:

• Student(snum, sname, major, level, age)


• Class(name, meets at, room, fid)
• Enrolled(snum, cname)

14
• Faculty(fid, fname, deptid)
Translate the following SQL-query into an expression of the relational algebra.

1. select the students names at level 2.


2. how many students are there above 25 age.

SELECT S.sname
FROM Student S
WHERE S.snum NOT IN (SELECT E.snum FROM Enrolled E)

SELECT C.name
FROM Class C
WHERE C.room = ’R128’ OR C.name IN (SELECT E.cname FROM Enrolled E GROUP BY
E.cname HAVING COUNT(*) >= 5)

SELECT F.fname
FROM Faculty F
WHERE 5 > (SELECT COUNT(E.snum) FROM Class C, Enrolled E WHERE C.name =
E.cname AND C.fid = F.fid)

Conclusion and Discussion: Thus we have learnt to write SQL equivalent Relational
Algebra Expressions and Query Tree.

QUIZ / Viva Questions:


1.what is relational alzebra
2. what is query tree

References:

5. Elmasri and Navathe, “Fundamentals of Database Systems”, 6th Edition, PEARSON


Education.
6. Korth, Slberchatz,Sudarshan, :”Database System Concepts”, 6th Edition, McGraw –
Hill
7. C. J. Date, A. Kannan, S. Swamynathan “An Introduction To Database Systems”, 8th
Edition Pearson Education.
8. Raghu Ramakrishnan and Johannes Gehrke, “Database Management Systems” 3rd
Edition - McGraw Hill edition.

15
OLAP Laboratory

Experiment No. : 3

16
Implementation of concurrency control
Two Phase Locking Protocol

Experiment No: 3
Aim: Implementation of concurrency control Two Phase Locking Protocol,

Requirments: java/python.

What will you learn by performing this experiment?

Students will be able to understand how the two phase locking protocol will done using
java or python language. This will help them to understand the implementation of
concurrency control protocol.

Theory:

What is Concurrency Control?

17
Concurrency control is a database management systems (DBMS) concept that is used to
address conflicts with the simultaneous accessing or altering of data that can occur with a
multi-user system. Concurrency control, when applied to a DBMS, is meant to coordinate
simultaneous transactions while preserving data integrity. The Concurrency is about to
control the multi-user access of database.

When more than one transactions are running simultaneously there are chances of a
conflict to occur which can leave database to an inconsistent state. To handle these
conflicts we need concurrency control in DBMS, which allows transactions to run
simultaneously but handles them in such a way so that the integrity of data remains
intact.

C. What is Two-Phase Locking (2PL)?

 Two-Phase Locking (2PL) is a concurrency control method which divides the execution
phase of a transaction.
 It ensures conflict serializable schedules.
 If read and write operations introduce the first unlock operation in the transaction, then it
is said to be Two-Phase Locking Protocol.

This protocol can be divided into two phases,


1. In Growing Phase, a transaction obtains locks, but may not release any lock.
2. In Shrinking Phase, a transaction may release locks, but may not obtain any lock.

 Two-Phase Locking does not ensure freedom from deadlocks.

Varients of 2- Phase Locking Protocol


1. Strict Two-Phase Locking Protocol
 Strict Two-Phase Locking Protocol avoids cascaded rollbacks.
 This protocol not only requires two-phase locking but also all exclusive-locks
should be held until the transaction commits or aborts.
 It is not deadlock free.
 It ensures that if data is being modified by one transaction, then other transaction
cannot read it until first transaction commits.
 Most of the database systems implement rigorous two – phase locking protocol.

2. Rigorous Two-Phase Locking


 Rigorous Two – Phase Locking Protocol avoids cascading rollbacks.
 This protocol requires that all the share and exclusive locks to be held until the
transaction commits.

18
3. Conservative Two-Phase Locking Protocol
 Conservative Two – Phase Locking Protocol is also called as Static Two – Phase
Locking Protocol.
 This protocol is almost free from deadlocks as all required items are listed in
advanced.
 It requires locking of all data items to access before the transaction starts.

Conclusion:
Thus, we have learnt the 2- Phase locking in concurrency control and its implementation.

QUIZ / Viva Questions:

1. Rigorous two-phase locking protocol permits releasing all locks at the

A. Beginning of transaction

B. During execution of transaction

C.End of transaction

D. Never in the life-time of transaction

References:

1. Elmasri and Navathe, “Fundamentals of Database Systems”, 6th Edition, PEARSON


Education.
2. Korth, Slberchatz,Sudarshan, :”Database System Concepts”, 6th Edition, McGraw –
Hill
3. C. J. Date, A. Kannan, S. Swamynathan “An Introduction To Database Systems”, 8th
Edition Pearson Education.
4. Raghu Ramakrishnan and Johannes Gehrke, “Database Management Systems” 3rd
Edition - McGraw Hill edition.

19
OLAP Laboratory

Experiment No. : 4

Implementation of Timestamp based


Protocol

20
Experiment No: 4
Aim: Implementation of Timestamp based Protocol

Program to check input schedule in timestamp ordering.

Requirments: java/python.

What will you learn by performing this experiment?

Students will be able to understand the implementation of timestamp based protocol using
java programming or python.

Theory:

Timestamp based Protocol:


1) Assumptions
 Every timestamp value is unique and accurately represents an instant in time.
 No two timestamps can be the same.
 A higher-valued timestamp occurs later in time than a lower-valued timestamp.

Whenever a transaction begins, it receives a timestamp. This timestamp indicates the order in
which the transaction must occur, relative to the other transactions. So, given two transactions
that affect the same object, the operation of the transaction with the earlier timestamp must
execute before the operation of the transaction with the later timestamp. However, if the
operation of the wrong transaction is actually presented first, then it is aborted and the
transaction must be restarted.
Every object in the database has a read timestamp, which is updated whenever the object's
data is read, and a write timestamp, which is updated whenever the object's data is changed.
If a transaction wants to read an object,

 but the transaction started before the object's write timestamp it means that
something changed the object's data after the transaction started. In this case, the
transaction is canceled and must be restarted.
 and the transaction started after the object's write timestamp, it means that it
is safe to read the object. In this case, if the transaction timestamp is after the
object's read timestamp, the read timestamp is set to the transaction timestamp.
 With each transaction Ti in the system, we associate a unique fixed timestamp, de-
noted by TS(Ti). This timestamp is assigned by the database system before the trans-
action Ti starts execution. If a transaction Ti has been assigned timestamp TS(Ti), and

21
a new transaction Tj enters the system, then TS(Ti) < TS(Tj ). There are two simple
methods for implementing this scheme:

 1. Use the value of the system clock as the timestamp; that is, a transaction’s time-
stamp is equal to the value of the clock when the transaction enters the system.

 2. Use a logical counter that is incremented after a new timestamp has been assigned;
that is, a transaction’s timestamp is equal to the value of the counter when the
transaction enters the system.

 The timestamps of the transactions determine the serializability order. Thus, if


TS(Ti) < TS(Tj ), then the system must ensure that the produced schedule is
equivalent to a serial schedule in which transaction Ti appears before transaction Tj .

 To implement this scheme, we associate with each data item Q two timestamp values:

 • W-timestamp(Q) denotes the largest timestamp of any transaction that executed


write(Q) successfully.

 • R-timestamp(Q) denotes the largest timestamp of any transaction that executed


read(Q) successfully.

 These timestamps are updated whenever a new read(Q) or write(Q) instruction is


executed.

Algorithm:
1. Read the two transactions.
2. Assign the timestamp value to the transaction (logical counter).
3. Associate Read timestamp and write timestamp.

Conclusion:
Thus, we have learnt t the timestamp based protocol.
Quiz /Discussion:
1. what is timestamp protocol.
2. what are operation of timestamp.
References:

5. Elmasri and Navathe, “Fundamentals of Database Systems”, 6th Edition, PEARSON


Education.
6. Korth, Slberchatz,Sudarshan, :”Database System Concepts”, 6th Edition, McGraw –
Hill
7. C. J. Date, A. Kannan, S. Swamynathan “An Introduction To Database Systems”, 8th
Edition Pearson Education.
8. Raghu Ramakrishnan and Johannes Gehrke, “Database Management Systems” 3rd
Edition - McGraw Hill edition.

22
23
OLAP Laboratory

Experiment No. : 5

Implementation of Log based recovery


mechanism

24
Experiment No: 5
Aim: Implementation of Log based Recovery mechanism.

Requirments: Java/Python programming.

Theory:

II. LOG-BASED RECOVERY


o Log is a sequence of records. Log of each transaction is maintained in some stable
storage so that if any failure occurs then it can be recovered from there.
o If any operation is performed on the database then it will be recorded on the log.
o But the process of storing the logs should be done before the actual transaction is
applied on the database.

Let's assume there is a transaction to modify the City of a student. The following logs are
written for this transaction.

o When the transaction is initiated then it writes 'start' log.


1. <Tn, Start>
o When the transaction modifies the City from 'Noida' to 'Bangalore', another log is
written to the file.
1. <Tn, City, 'Noida', 'Bangalore' >
o When the transaction is finished then it writes another log to indicate end of the
transaction.
1. <Tn, Commit>

There are two approaches to modify the database:

1) 1. Deferred database modification:


o The deferred modification technique occurs if the transaction does not modify the
database until it has committed.
o In this method, all the logs are created and stored in the stable storage and the
database is updated when a transaction commits.
2) 2. Immediate database modification:
o The Immediate modification technique occurs if database modification occurs while
transaction is still active.
o In this technique, the database is modified immediately after every operation. It
follows an actual database modification.

Log and log records –

25
The log is a sequence of log records, recording all the update activities in the database. In a
stable storage, logs for each transaction are maintained. Any operation which is performed on
the database is recorded is on the log. Prior to performing any modification to database, an
update log record is created to reflect that modification.
An update log record represented as: <Ti, Xj, V1, V2> has these fields:
1. Transaction identifier: Unique Identifier of the transaction that performed the write
operation.
2. Data item: Unique identifier of the data item written.
3. Old value: Value of data item prior to write.
4. New value: Value of data item after write operation.

B. Recovery using Log records

When the system is crashed,the system consults the log to find which transactions need to be
undone and which need to be redone.

1. If the log contains the record <Ti, Start> and <Ti, Commit> or just <Ti, Commit> then
the Transaction Ti needs to be redone.
2. If log contains record<Tn, Start> but does not contain the record either <Ti, commit>
or <Ti, abort> then the Transaction Ti needs to be undone.

Implementation steps:

1. Create a manual log file (text file).

2. Program should take log file input. Menu for normal run and recovery.

3. Scan the log file and perform: Undo for uncommitted transactions and Redo for committed
transactions.

Sample log file contents given below:

T0 start
T0,A,950
T0,B,2050
T0 commit
T1 start
T1,C,600

Conclusion:
Thus, we have learnt different kinds of failures and the log based recovery mechanism.
Quiz/discussion-

26
References:

1. Elmasri and Navathe, “Fundamentals of Database Systems”, 6th Edition, PEARSON


Education.
2. Korth, Slberchatz,Sudarshan, :”Database System Concepts”, 6th Edition, McGraw –
Hill
3. C. J. Date, A. Kannan, S. Swamynathan “An Introduction To Database Systems”, 8th
Edition Pearson Education.
4. Raghu Ramakrishnan and Johannes Gehrke, “Database Management Systems” 3rd
Edition - McGraw Hill edition.

27
OLAP Laboratory

Experiment No. : 6

Case Study- Distributed database for a real


life application and simulation of recovery
methods

28
Experiment No 6
Aim: Case Study- distributed database for a real life application and simulation of recovery
methods.

What will you learn by performing this experiment?

 The student should understand the different types of distributed database,


types of transparencies.

Software Required: Desktop PC, 4 GB ram, Oracle 91, MS SQL server 2000, Client/server
architecture, MySql.

Theory:

A distributed database is a database that is under the control of a central database


management system (DBMS) in which storage devices are not all attached to a
common CPU. It may be stored in multiple computers located in the same physical location,
or may be dispersed over a network of interconnected computers.

Distributed database systems operate in computer networking environments where


component failures are inevitable during normal operation. Failures not only threaten normal
operation of the system, but they may also destroy the consistency of the system by direct
damage to the storage subsystem. To cope with these failures, distributed database systems
must provide recovery mechanisms which maintain the system consistency. types of failures
that may occur in distributed database systems are discussed and the appropriate recovery
actions.

C. Recovery from Power Failure


Power failure causes loss of information in the non-persistent memory. When power is
restored, the operating system and the database management system restart. Recovery
manager initiates recovery from the transaction logs.

In case of immediate update mode, the recovery manager takes the following actions −

 Transactions which are in active list and failed list are undone and written on the
abort list.

 Transactions which are in before-commit list are redone.

 No action is taken for transactions in commit or abort lists.

29
In case of deferred update mode, the recovery manager takes the following actions −

 Transactions which are in the active list and failed list are written onto the abort list.
No undo operations are required since the changes have not been written to the disk
yet.

 Transactions which are in before-commit list are redone.

 No action is taken for transactions in commit or abort lists.

D. Recovery from Disk Failure


A disk failure or hard crash causes a total database loss. To recover from this hard crash, a
new disk is prepared, then the operating system is restored, and finally the database is
recovered using the database backup and transaction log. The recovery method is same for
both immediate and deferred update modes.

The recovery manager takes the following actions −

 The transactions in the commit list and before-commit list are redone and written
onto the commit list in the transaction log.

 The transactions in the active list and failed list are undone and written onto the abort
list in the transaction log.

E. Checkpointing
Checkpoint is a point of time at which a record is written onto the database from the buffers.
As a consequence, in case of a system crash, the recovery manager does not have to redo the
transactions that have been committed before checkpoint. Periodical checkpointing shortens
the recovery process.

The two types of checkpointing techniques are −

 Consistent checkpointing
 Fuzzy checkpointing

1) Consistent Checkpointing
Consistent checkpointing creates a consistent image of the database at checkpoint. During
recovery, only those transactions which are on the right side of the last checkpoint are
undone or redone. The transactions to the left side of the last consistent checkpoint are
already committed and needn’t be processed again. The actions taken for checkpointing are

 The active transactions are suspended temporarily.


 All changes in main-memory buffers are written onto the disk.

30
 A “checkpoint” record is written in the transaction log.

 The transaction log is written to the disk.

 The suspended transactions are resumed.

If in step 4, the transaction log is archived as well, then this checkpointing aids in recovery
from disk failures and power failures, otherwise it aids recovery from only power failures.

2) Fuzzy Checkpointing
In fuzzy checkpointing, at the time of checkpoint, all the active transactions are written in
the log. In case of power failure, the recovery manager processes only those transactions that
were active during checkpoint and later. The transactions that have been committed before
checkpoint are written to the disk and hence need not be redone.

3) Example of Checkpointing
Let us consider that in system the time of checkpointing is tcheck and the time of system
crash is tfail. Let there be four transactions Ta, Tb, Tc and Td such that −

 Ta commits before checkpoint.

 Tb starts before checkpoint and commits before system crash.

 Tc starts after checkpoint and commits before system crash.

 Td starts after checkpoint and was active at the time of system crash.

F. Distributed One-phase Commit


Distributed one-phase commit is the simplest commit protocol. Let us consider that there is a
controlling site and a number of slave sites where the transaction is being executed. The
steps in distributed commit are −

 After each slave has locally completed its transaction, it sends a “DONE” message to
the controlling site.

 The slaves wait for “Commit” or “Abort” message from the controlling site. This
waiting time is called window of vulnerability.

 When the controlling site receives “DONE” message from each slave, it makes a
decision to commit or abort. This is called the commit point. Then, it sends this
message to all the slaves.

 On receiving this message, a slave either commits or aborts and then sends an
acknowledgement message to the controlling site.

31
G. Distributed Two-phase Commit
Distributed two-phase commit reduces the vulnerability of one-phase commit protocols. The
steps performed in the two phases are as follows −

Phase 1: Prepare Phase

 After each slave has locally completed its transaction, it sends a “DONE” message to
the controlling site. When the controlling site has received “DONE” message from
all slaves, it sends a “Prepare” message to the slaves.

 The slaves vote on whether they still want to commit or not. If a slave wants to
commit, it sends a “Ready” message.

 A slave that does not want to commit sends a “Not Ready” message. This may
happen when the slave has conflicting concurrent transactions or there is a timeout.

Phase 2: Commit/Abort Phase

 After the controlling site has received “Ready” message from all the slaves −

o The controlling site sends a “Global Commit” message to the slaves.

o The slaves apply the transaction and send a “Commit ACK” message to the
controlling site.

o When the controlling site receives “Commit ACK” message from all the
slaves, it considers the transaction as committed.

 After the controlling site has received the first “Not Ready” message from any slave

o The controlling site sends a “Global Abort” message to the slaves.

o The slaves abort the transaction and send a “Abort ACK” message to the
controlling site.

o When the controlling site receives “Abort ACK” message from all the slaves,
it considers the transaction as aborted.

H. Distributed Three-phase Commit


The steps in distributed three-phase commit are as follows −

Phase 1: Prepare Phase

The steps are same as in distributed two-phase commit.

Phase 2: Prepare to Commit Phase

32
 The controlling site issues an “Enter Prepared State” broadcast message.
 The slave sites vote “OK” in response.

Phase 3: Commit / Abort Phase

The steps are same as two-phase commit except that “Commit ACK”/”Abort ACK” message
is not required.

Procedure/ Program:
Steps of Distributed Database Design
Top-down approach: first the general concepts, the global framework are defined,
after then the details.
Down-top approach: first the detail modules are defined, after then the global
framework. If the system is built up from a scratch, the top-down method is more
accepted. If the system should match to existing systems or some modules are yet
ready, the down-top method is usually used.
General design steps according to the structure:

 analysis of the external, application requirements


 design of the global schema
 design of the fragmentation
 design of the distribution schema
 design of the local schemes
 design of the local physical layers

DDBMS -specific design steps:

 design of the fragmentation


 design of the distribution schema

QUIZ / Viva Questions:


 What is distributed database?
 What are recovery requirements?
 What are the different recovery techniques used?

Conclusion:
The DDB database real life scenarios is studied and its requirements are documented. The
simulation tool is used to demonstrate the recovery mechanism in distributed environment.

References:

1. Date, C.J., Introduction to Database Systems (7 th Edition) Addison Wesley,


2000
33
2. Leon, Alexis and Leon, Mathews, Database Management Systems,
LeonTECHWorld
3. Elamasri R . and Navathe, S., Fundamentals of Database Systems (3 rd
Edition), Pearsson Education, 2000.
4. Raghu Ramakrishnan and Johannes Gehrke, “Database Management Systems”
3rd Edition - McGraw Hill edition.
5. Umut Tosun, Distributed Database Design: A Case Study, International
Workshop on Intelligent Techniques in Distributed Systems (ITDS-2014),
Procedia Computer Science 37 ( 2014 ) 447 – 450

34
OLAP Laboratory

Experiment No. : 7

Advance database models: case study


based on assignments for temporal, mobile
or spatial databases

35
Experiment No: 7
Aim: Advanced Database Models Case study for Temporal, Mobile or Spatial databases

A Case Study on Spatio-Temporal Data Mining of Urban Social Management Events Based
on Ontology Semantic Analysis (Students may refer to different case study)

Theory:

The massive urban social management data with geographical coordinates from the
inspectors, volunteers, and citizens of the city are a new source of spatio-temporal data,
which can be used for the data mining of city management and the evolution of hot events to
improve urban comprehensive governance. . First, an ontology model for USMEs is
presented to accurately extract effective social management events from non-structured
UMSEs. Second, an explorer spatial data analysis method based on “event-event” and “event-
place” from spatial and time aspects is presented to mine the information from UMSEs for
the urban social comprehensive governance. The data mining results are visualized as a
thermal chart and a scatter diagram for the optimization of the management resources
configuration, which can improve the efficiency of municipal service management and
municipal departments for decision-making.

1. Materials and Methods

The concept system of social comprehensive governance is huge and complex, and there are
various kinds of events. The extraction of interesting hot events and the spatio-temporal
information mining, which is only one of the many entry points in this field is important. It
has a broad research space in the information mining of the social comprehensive
management events based on space-time management, whether in content or method. The
smart city platform adds a geographic coordinate tag for a variety of events and log data
generated from the city management process, but these data records are from inspectors,
volunteers in the city management, and even citizens; the events are described as unstructured
natural language. This case study proposes a spatio-temporal data mining approach based on
the urban social management events to extract unstructured natural language information, to
find the event spatio-temporal distribution pattern, and to provide visualized decision support

36
for the social management and comprehensive control of the city. The technical framework of
the proposed approach is shown in Figure 1.

37
The purpose of urban management and comprehensive administration is to maintain a good
environment for social development. During the process of urban management, there are a
large number of work record data. Thus, how to make use of these work records well to
excavate useful information hidden in these historical data is very important for the decision-
making of further urban social governance. The content of city management is huge with a
complicated structure for urban governance. This study puts forward a concept system of
urban social management events. An ontology model is proposed for the massive spatio-
temporal data mining of social management and comprehensive control events. It designs the
process of the construction of the ontology, builds the ontology using the existing tools, and
realizes the extraction of the hot events in city management based on the semantic reasoning
of ontology with Java-based frameworks, whose comprehensiveness and accuracy are higher
than that of the old ones. This paper also introduces the spatio-temporal information mining
for discrete USMEs from three perspectives: geographical statics, spatial aggregation and
correlation relationship. A spatial-temporal correlation data mining between events and

38
locations or between events and events is proposed to mine the spatial-temporal information
from the discrete and massive city’s comprehensive management events.

Conclusion:

Thus the case study for spatial and temporal data has been performed.

References:

1. Date, C.J., Introduction to Database Systems (7 th Edition) Addison Wesley,


2000
2. Leon, Alexis and Leon, Mathews, Database Management Systems,
LeonTECHWorld
3. Elamasri R . and Navathe, S., Fundamentals of Database Systems (3 rd
Edition), Pearsson Education, 2000.
4. Raghu Ramakrishnan and Johannes Gehrke, “Database Management Systems”
3rd Edition - McGraw Hill edition.
5. Umut Tosun, Distributed Database Design: A Case Study, International
Workshop on Intelligent Techniques in Distributed Systems (ITDS-2014),
Procedia Computer Science 37 ( 2014 ) 447 – 450.

39
OLAP Laboratory

Experiment No. : 8

Construction of Star Schema and


Snowflake Schema for company database

40
Experiment No. 8
Aim: Construction star schema and Snowflake schema for company database.

What will you learn by performing this experiment?

 Student will able to design star schema. Able to understand the different keys
and join are used in star schema.
 Converting the star schema to snowflake schema.

Software Required: Desktop PC, 4 GB ram, Oracle Enterprise Edition, MS SQL server
2000, Client, MySql, Weka learning tool, SQL *loader.

Theory: Star schema is the arrangement of fact table at the core and the dimension tables
surrounding it. Each dimension table has direct relationship with fact table in the middle.
When the query made against data warehouse the result of the query are produced by
combining or joining one row of the dimension table with one or more of the fact table.

Dimension Table:

 Dimension table represents business dimensions using which the metrics are
analyzed.
 Dimension table often provide multiple hierarchies-hierarchies are used for
drill down and roll up.
 Dimension table should have its own surrogate key as primary key without
any built in meaning. Along with the operational system key

Fact Table:

 A table in a star schema contains facts and connected to dimension tables.


 A fact table typically has two types of columns: those that contain facts and
those that are foreign keys to dimensional tables
 A primary key of a fact tables is usually a composite key that is made up of all foreign
keys.
 A fact table might contain either detail level facts or fact that has been aggregated
 A fact table usually contains facts with the same level of aggregation.

Algorithm:

Step1: Identify a business process for analysis.

41
Step2: Identify the measures or facts.

Step3: Identify the dimensions for the facts.

Step4: write the information package having dimensions and facts.

Step 5: Design the star schema for the information package

Step 6:.Implement the star schema

Snowflake Schema:

“Snowflakeing” is a dimension tables in a tables in a star schema. When you completely


normalize the entire dimension table resultant structure resembles a snowflake with the fact
table in middle.

The original star schema for sales contains only 5 tables where as the normalized version now
extended to eleven tables. These new tables are linked back to original dimension table
through artificial keys.

Snowflakeing is not generally summonsed in data warehouse. Query performance takes


highest significance in data warehouse and performance.

Formatting a Sub-dimension:

The principle behind snowflake is normalization of the dimension tables by removing low
cardinality attribute and formatting separate table. In a similar manner some situations
provide opportunities to separate out a set of attributes and from a sub dimension. This
process is very close to snow flaking technique. The given figure shows how a demographic
sub-dimension is formed out of the customer dimension.

Although, forming sub-dimensions may be constructed snow flaking it makes a lot of sense
to separate out the demographic attributes differ in granularity, of customer dimension is very
large running into millions of rows the saving in storage space could be substantial. Another
valid reason for separating out demographic attributes relates to the browsing of attributes.

Algorithm:

1. Consider the star schema of the previous experiment

2. Identify the attributes of any dimension table in star schema with low cardinality.

42
3. Remove this attributes from original dimension table and create new table

4. Link the new table with original dimensional table through artificial key.

Procedure/ Program:

Algorithm for star schema:

Step1: Identify a business process for analysis.

Step2: Identify the measures or facts.

Step3: Identify the dimensions for the facts.

Step4: write the information package having dimensions and facts.

Step 5: Design the star schema for the information package

Step 6:.Implement the star schema

Conclusion: Thus we have learned the implantation of star schema using informatica and Ms
Server 2000 tools.

QUIZ / Viva Questions:

 What are OLAP and OLTP?


 What is surrogate key?
 What is factless fact table?
 Define star and snowflake schema.
 Define slowly changing dimensions.
 Define primary and foreign key.

References:

1. Date, C.J., Introduction to Database Systems (7 th Edition) Addison Wesley, 2000


2. Leon, Alexis and Leon, Mathews, Database Management Systems,
LeonTECHWorld
3. Data Warehousing Fundamentals: A Comprehensive Guide for IT Professionals.
Paulraj Ponniah
4. Kimball Dimensional Modeling Techniques.
5. Data warehousing by Reema Thareja
43
44
OLAP Laboratory

Experiment No. : 9

OLAP Exercise a) Construction of cubes


b) OLAP operations, OLAP queries

45
Experiment No. 9
Aim: OLAP Exercise a) Construction of Cubes b) OLAP Operations, OLAP Queries (Slice,
dice, roll-up, drill-down and pivot).

What will you learn by performing this experiment?

Data Extraction:- For a data warehouse, data is extracted from many disparate sources also
you have to extract data on the change for one time initial full load.

Data Transformation: Having information that is usable for strategic decision making is the
underlying principle of DW. Extracted data is raw data and it cannot be applied to the DW.
Thus all the extracted data must be made usable in D. Transformation of source data example
a wide Varity of manipulations to change all the extracted source data into usable information
to be stored in DW.

Data loading: It is agreed that transformation techniques end as soon as load images are
created. The next major set of functions consists of the ones that take the prepared data, apply
it to the data warehouse and store it in db

OLAP: On-line analytical Processing (OLAP) is a category of software technology that


enables analysts, managers and executives to gain insight into the data through fast,
consistent, interactive access to wide variety of possible views of information that has been
transformed from raw data to reflect the real dimensionality of enterprise as understood
by the user.

Software Required: Desktop PC, 4 GB ram, Oracle Enterprise Edition, MS SQL server 2000,
Client, MySql.

Theory: Data Extraction Issues:-

 Source identification: Identify source applications and source structures.


 Method of extraction: For each data source define whether each extraction
process is manual or load based.
 Extraction frequency: for each data define whether the extraction process is
manual or tool based.
 Time window: For each data, denote the time window for extraction process.
 Job sequencing: Determines whether the beginning of one job in an extraction
job stream has to wait until the previous job has finished successfully.

46
Data Extraction Techniques:

Broadly there are 2 major types of data extraction

 Static data:- Static data is the capture of data of a given point in time. Static data
capture is primarily used for initial load.
 Data revision: Data revision is also known as incremental data capture. Incremental
data capture may be defined or immediate within a group of immediate data structure
there are 3 distinct options: Two separate options are available for defined data
capture.

Immediate Data Extraction: In this data extraction is real time. It occurs as the transaction
happen at the source data base and files options for immediate data extraction.

 Capture through Transaction Logs: - This option uses the transaction logs of
DBMS. Maintained for recovery from possible failure.
 Capture through DB triggers: - Create trigger program for all the event from
which data is to be captured.
 Capture in Source Application: - Source Application is made to asset in data
capture for DW.

Deferred Data Extraction: Techniques under deferred data extraction do not capture changes
in real time. Options are:

 Capture based on date and time stamp: The extraction procedure has to extract
all the records having timestamp greater then timestamp of last extraction.
 Capture based on comparing the files: Compares two separate snapshots of the
source data using this it can find out all updates, deletes and inserts.

Algorithm:

 Identify the data sources


 Create the data staging area to store the extracted data
 extract the data from the sources
 store the extracted data in the tables of the data staging area

Data Transformation functions break down into a few basic tasks.

 Selection: - This task takes place at the beginning. In this we select either whole
records or part of several records from the source system. The task of selection
usually from part of extraction function itself.
47
 Splitting/ joining: - This task includes the types of data manipulation needed to
perform on the selected parts of source records.
 3. Conversion: This is an all inclusive task and includes a large verity of rudimentary
conversion of single fields for two primary resource one to standardize among the
data extractions from disparate source system and the other to make the fields usable
and understandable to users.
 Summarization: Sometimes it’s not feasible to keep data at the lowest level of details
in DW so the data transformation function includes summarization of daily
transactions.
 Enrichment: This task is the rearrangement and simplification of individual fields to
make them more useful for DW. We may use one or more fields from the same input
record to create a better view of data for the DW.

Consider specific types of transformation function:

 Format revisions: These revisions include changes to data types and lengths of
individual fields.
 Decoding of fields: These are also a common type of data transformation when we
deal with multiple source system. We need to decode all such cryptic codes and
change into values that make sense to the user.
 Calculated and derived values: for eg. The extracted data from sales system contains
sales, amt, units and operating cost.
 Splitting of single fields: We may improve the operating performance by indexing on
individual components.
 Merging of info: This includes merging of several fields to create a single field of
data.
 Character set conversion: This type of data transformation relates if the conversion of
character sets to an agreed standard character set for textural data in the warehouse.
 Conversion of units of measurement: In this we have to convert metrics and that the
number may all be in one standard unit of measurement.
 Data/time Conversion: this relate to the representation of data and time in the
representation of data and time in standard formats.
 Summarization: This includes creation of summaries to be loaded in the data
warehouse instead of loading most granular level of data.
 Key restructuring: While choosing keys for data warehouse avoid keys with built-in
meanings. Transform such keys into generic keys generated by system itself. This is
called the key restructuring.

48
 Duplication: In this, we keep a single record and links are duplicates in the source
systems to this single record.

Algorithm:

 Identify the transformations required for standardization


 Apply the respective transformation on the extracted data.

Data may be applied to the DW in the following 4 methods.

LOAD:

 If the target table to be loaded already exists and data exists in the table, the load
process wipes out the existing data and applies data from incoming file.

APPEND:

 If data already exists in table, the append process unconditionally adds the incoming
data, preserving the existing data into the target table. The incoming record may be
allowed to be added as a duplicate.

DESTRUCTIVE MERGE:

 In this mode, you apply the incoming data to target data. If primary key of an
incoming record matches with the key of an existing record, update the matching
target record.

CONSTRUCTIVE MERGE:

 This mode is tightly different from destructive merge. If primary key of incoming
record matches with key of existing record, add the incoming record and mark the
added record as superseding the old record.

The modes of applying data to the data warehouse let into 3 types of loads.

 Initial Load: Here, every load run creates the db tables from scratch. If you need more
than one run to create a single table and your load runs for a single table must be
scheduled to run several days.
 Incremental Loads: These are the applications of ongoing changes from the source
system. Changes to the source systems are always tied to specific times, irrespective
of whether they are based on explicit time-stamps in source systems.
49
 Full Refresh: This type of data application involves rewriting the entire DW.
Sometimes you may also do partial refreshes to rewrite only specific tables.
 As far as the data application modes are concerned, full refresh is similar to the initial
load. However in case of full mode refresh, data exists in target tables before
incoming data is applied

Algorithm:

 Map the tables from the data staging area to the respective dimension table and fact
table of the data warehouse(star schema)
 Load the data from the tables of the data staging area to the dimension and fact table
of the data warehouse using initial load

OLAP operations:

Pivot: This operation is also called rotate operation that rotates the data in order to provide
alternative presentation

Slice: This operation performs a selection on one dimension of the given cube resulting in a
sub cube.

Dice: The Dice operation defines a sub cube by performing a selection on two or more
dimensions.

Roll-up: This operation involves computing all the data relationships for one or more
dimensions. To do this, a computational relationship or formula might be defined.

Drill down/up: This is a specific analytical technique whereby the user navigates among the
levels of data ranging from most summarized (up) to most detail (down).

Algorithm

 Design star schema


 Show the contents of the cube
 Give examples for Slice, Dice, Pivot operations
 Write Queries for Drill down and Roll up Operations and show the output

Procedure/ Program:

 In general, building any data warehouse consists of the following steps:


 Extracting the transactional data from the data sources into a staging area
 Transforming the transactional data

50
 Loading the transformed data into a dimensional database
 Building pre-calculated summary values to speed up report generation
 Building (or purchasing) a front-end reporting tool

Conclusion: Thus we have created sample data warehouse using ETL tool and observe
different OLAP operations.

QUIZ / Viva Questions:

 What is ETL tool?


 Different types of load.
 Define Slice, Dice, and roll up, Drill down and pivot.
 Define data transformation techniques.
 Define Data Extraction techniques.

References:

1. Date, C.J., Introduction to Database Systems (7 th Edition) Addison Wesley, 2000


2. Leon, Alexis and Leon, Mathews, Database Management Systems, LeonTECHWorld
3. Data Warehousing Fundamentals: A Comprehensive Guide for IT Professionals.
Paulraj Ponniah
4. Kimball Dimensional Modeling Techniques.
5. Data warehousing by Reema Thareja

OLAP Laboratory

51
Experiment No. : 10

Case study on issues and usage of modern


tools technologies in advance databases

52

Das könnte Ihnen auch gefallen