Sie sind auf Seite 1von 51

Database Systems:

Course Overview

Professor Navneet Goyal


Department of Computer Science & Information Systems
BITS, Pilani
Text Book
Hector G Molina, Jeffrey D.Ullman &
Jennifer Widom.
Database Systems – The Complete Book,
Pearson Education, 2002.

Home Page:
http://www-db.stanford.edu/~ullman/dscb.html

© Prof. Navneet Goyal, BITS, Pilani


Reference Books
n Ramakrishna R. & Gehrke J.
Database Management Systems, 3e, Mc-Graw Hill, 2003.
http://www/cs.wisc.edu/~dbbook

n Silberschatz A, Korth H F, & Sudarshan S.


Database System Concepts, 5e, TMH, 2005.
http://www.db-book.com
http://www.mhhe.com/silberschatz

n Elmarsi R, & Navathe S B.


Fundamental of Database System, 5e, Pearson Education, 2008.
http://www.aw.com/cssupport

n Robinson, I, Webber, J, & Eifrem E.


Graph Databases, 2e, O’Reilly, 2015.
https://neo4j.com/graph-databases-book/

© Prof. Navneet Goyal, BITS, Pilani


Data

© Prof. Navneet Goyal, BITS, Pilani


Data
Data
n So what is Data?
n Why are we interested in Data?
n Sources of Data?
n Management of Data
n Big Data!
n What’s the biggest asset of companies like
Google, Yahoo, Amazon, FB, Walmart, etc.?
n We are living in a data driven world!

© Prof. Navneet Goyal, BITS, Pilani


Data: A simple exercise
n How much data you generate in a day??

n Which activities generate data??

© Prof. Navneet Goyal, BITS, Pilani


A word about DATA
n If data had mass, the Earth would be a black hole!!
n Data is the new Oil!!
n Expected to reach 40 ZB by 2020!!
n In 2012, we had about 2.8 ZB*
n Only 1/4th of this data could produce useful
information
n Only 3% of it was tagged
n Only 0.5% of it was actually used for some kind of
analysis
(*Report by John Gantz & David Reinsel – sponsored
by EMC)
Some Interesting Facts
n During the first day of a baby’s life, the
amount of data generated by humanity
is equivalent to 70 times the
information contained in the library of
the Congress
(www.BabyCenter.com)
n How do babies learn to speak? (Prof.
Deb Roy, MIT Media Asia Lab)
n Human Speechome project
n 11 video cameras, 14 microphones,….
n 200 GBs of data each day!!!!
Some Interesting Facts
n Google
n 50% of all internet users use Google every
day
n 7.2 bn page views per day
n 20 PB of data processed daily
n Youtube
n 48 hours of video uploaded every minute
n 4 bn views per day
n Most viewed video – Bieber’s Baby -
763684702
Tsunami of Data
n Telecom data (≈ 4.6 bn mobile subscribers)

q There are 3 Billion Telephone Calls in US each day,


30 Billion emails daily, 1 Billion SMS, IMs.

q IP Network Traffic: up to 1 Billion packets per hour per router.


Each ISP has many (hundreds) routers!
n WWW
n Weblog data (160 mn websites)
n Email data
n Satellite imaging data
n Social networking sites data
n Genome data
n CERN’s LHC (15 petabytes/year)
Tsunami of Data
n In 2005, mankind created 150 exabytes of data
n In 2010, it created 1200 exabytes
n How much are we creating now???
Tsunami of Data
n No. of pics on Facebook
n 15 bn unique photos

n 60 bn photos stored (4 sizes)

¨ Imageshack (20 bn)


n Photobucket (7.2 bn)
n Flickr (3.4 bn)
n Multiply (3 bn)
Topics
n Evolution of Databases
n Data, Database, DBMS, & DBS
n Data Modeling
n Relational Databases
n Schema Design & Normalization
n Query Languages
n Storage & Indexing
n Query Processing & Optimization
n Concurrency
n Crash Recovery
n Advanced Topics – Big Data & NoSQL Databases

© Prof. Navneet Goyal, BITS, Pilani


Databases Everywhere!!!
n DBMS contains information about a particular enterprise
n Collection of interrelated data
n Set of programs to access the data
n An environment that is both convenient and efficient to use
n Database Applications:
n Banking: all transactions
n Airlines: reservations, schedules
n Universities: registration, grades
n Sales: customers, products, purchases
n Online retailers: order tracking, customized recommendations
n Manufacturing: production, inventory, orders, supply chain
n Human resources: employee records, salaries, tax
deductions
n Social Media – Facebook & Twitter use Graph Databases
n Databases touch all aspects of our lives

© Prof. Navneet Goyal, BITS, Pilani


Biggest OLTP System
n SABRE
n Sabre is a computer reservations
system/global distribution system
(GDS) used by airlines, railways,
hotels, travel agents and other
travel companies
n Used by more than 200 airlines

© Prof. Navneet Goyal, BITS, Pilani


DBMS –
Is it a Dry Area?
n The area of DBMS is a microcosm of
computer science in general
n The issues addressed and the
techniques used span a wide
spectrum including
n Languages
n Object-orientation & other
programming paradigms

© Prof. Navneet Goyal, BITS, Pilani


DBMS –
Is it a Dry Area?
n Compilation
n Operating systems
n Concurrent programming
n Data structures
n Algorithms
n Parallel & distributed computing
n User interfaces
n Expert systems & AI
n Statistical techniques & Dynamic
programming
Reference: DBMS by Raghurama Krishna & Gherke, 3e

© Prof. Navneet Goyal, BITS, Pilani


Basic Definitions
n Database: A collection of related data.
n Data: Known facts that can be recorded and
have an implicit meaning.
n Mini-world: Some part of the real world about
which data is stored in a database. For example,
student grades and transcripts at a university.
n Database Management System (DBMS): A
software package/ system to facilitate the
creation and maintenance of a computerized
database.
n Database System: The DBMS software
together with the data itself. Sometimes, the
applications are also included.

© Prof. Navneet Goyal, BITS, Pilani


DBMS Functionalities
n Define a database : in terms of data types,
structures and constraints
n Construct or Load the Database on a
secondary storage medium
n Manipulating the database : querying,
generating reports, insertions, deletions
and modifications to its content
n Concurrent Processing and Sharing by a set
of users and programs – yet, keeping all
data valid and consistent
n Crash Recovery

© Prof. Navneet Goyal, BITS, Pilani


File System vs. DBMS
n A company has 500 GB of data on
employees, departments, products, sales,
& so on..
n Data is accessed concurrently by several
employees
n Questions about the data must be
answered quickly
n Changes made to the data by different
users must be applied consistently
n Access to certain parts of the data be
restricted

© Prof. Navneet Goyal, BITS, Pilani


File System vs. DBMS
n Data stored in operating system files
n Many drawbacks!!!

© Prof. Navneet Goyal, BITS, Pilani


File System vs. DBMS
n These drawbacks have prompted the
development of database systems
n Database systems offer solutions to
all the above problems?

© Prof. Navneet Goyal, BITS, Pilani


Advantages of a DBMS
n Program-Data Independence
n Insulation between programs and data: Allows
changing data storage structures and operations
without having to change the DBMS access programs.
n Efficient Data Access
n DBMS uses a variety of techniques to store & retrieve
data efficiently
n Data Integrity & Security
n Before inserting salary of an employee, the DBMS can
check that the dept. budget is not exceeded
n Enforces access controls that govern what data is
visible to different classes of users

© Prof. Navneet Goyal, BITS, Pilani


Advantages of a DBMS
n Data Administration
n When several users share data , centralizing the
administration offers significant improvement
n Concurrent Access & Crash Recovery
n DBMS schedules concurrent access to the data in such
a manner that users think of the data as being
accessed by only one user at a time
n DBMS protects users from the ill-effects of system
failures
n Reduced Application Development Time
n Many important tasks are handled by the DBMS

© Prof. Navneet Goyal, BITS, Pilani


Benchmarking DBs
n The term transaction is often applied to a
wide variety of business and computer
functions. Looked at as a computer
function, a transaction could refer to a set
of operations including disk read/writes,
operating system calls, or some form of
data transfer from one subsystem to
another

© Prof. Navneet Goyal, BITS, Pilani


Benchmarking DBs
n While TPC benchmarks certainly involve the
measurement and evaluation of computer
functions and operations, the TPC regards a
transaction as it is commonly understood in the
business world: a commercial exchange of
goods, services, or money. A typical transaction,
as defined by the TPC, would include the
updating to a database system for such things
as inventory control (goods), airline reservations
(services), or banking (money).

© Prof. Navneet Goyal, BITS, Pilani


Benchmarking DBs
n In these environments, a number of customers
or service representatives input and manage
their transactions via a terminal or desktop
computer connected to a database. Typically,
the TPC produces benchmarks that measure
transaction processing (TP) and database (DB)
performance in terms of how many transactions
a given system and database can perform per
unit of time, e.g., transactions per second (tpsC)
or transactions per minute (tpmC)

© Prof. Navneet Goyal, BITS, Pilani


Benchmarking DBs
n Results

© Prof. Navneet Goyal, BITS, Pilani


Levels of Abstraction
n Databases provide users with an
abstract view of data

© Prof. Navneet Goyal, BITS, Pilani


Relational Query
Languages
n Query languages: Allow manipulation
and retrieval of data from a database.
n Relational model supports simple,
powerful QLs:
n Strong formal foundation based on logic.
n Allows for much optimization.
n Query Languages != programming
languages!
n QLs not expected to be “Turing complete”.
n QLs not intended to be used for complex
calculations.
n QLs support easy, efficient access to large data
sets.
© Prof. Navneet Goyal, BITS, Pilani
Formal Relational
Query Languages
n Two mathematical Query
Languages form the basis for “real”
languages (e.g. SQL), and for
implementation:
n Relational Algebra: More operational,
very useful for representing execution
plans.
n Relational Calculus: Lets users
describe what they want, rather than
how to compute it. (Non-operational,
declarative.)
© Prof. Navneet Goyal, BITS, Pilani
The SQL Query
Language
n SQL has been influenced by both
Relational Algebra (RA) & Relational
Calculus (RC)
n More so by RC, particularly Tuple
relational Calculus (TRC)
n The other variant of RC is Domain
Relational Calculus (DRC) which has
greatly influenced Query By Example
(QBE)

© Prof. Navneet Goyal, BITS, Pilani


The SQL Query
Language
n SQL consists of:
n DDL (Data Definition Language)
• Create conceptual schema
n DML (Data Manipulation Language)
• Relational operators
• Insert, Delete, Update
n VDL (View Definition Language)
• Specify user views & their mapping to the
conceptual schema (in most DBMSs, done by
DDL)
n SDL (Storage Definition Language)
• File organization
• Indexes
© Prof. Navneet Goyal, BITS, Pilani
The SQL Query
Language
n SDL is being removed from SQL
n DMLs
n High-level or nonprocedural (declarative)
• Can be entered at the SQL > or can be embedded in a
general purpose programming language
• Can specify & retrieve many records uin a single DML
statement & are hence called Set-at-a-time DMLs
n Low-level or procedural
• Must be embedded in a general purpose
programming language
• Typically retrieves individual records or objects from
the DB & processes each separately
• Therefore it need PL constructs like looping
• Record-at-a-time DMLs

© Prof. Navneet Goyal, BITS, Pilani


The SQL Query
Language
DMLs
n Whenever DML statements are embedded in a PL,
that language is called as the host language and
the DML is called the Data Sublanguage
n In object DBs, the host language & data
sublanguage form one integrated language – for
eg. C++ with some extensions to support
database functionality
n Some RDBMSs also provide integrated languages –
for eg. ORACLE’s PL/SQL.

© Prof. Navneet Goyal, BITS, Pilani


Operations on
Relations
n Restrict
n Project
n Join
n Divide Relational Operations
n Union Set Operations
n Intersection
n Difference
n Product
Closure Property
© Prof. Navneet Goyal, BITS, Pilani
Steps in Query
Processing

1. Parsing and translation


2. Optimization
3. Evaluation

© Prof. Navneet Goyal, BITS, Pilani


Steps in Query
Processing

© Prof. Navneet Goyal, BITS, Pilani


Steps in Query
Processing
n Parsing and translation
n Translate the query into its internal form.
n Translation is similar to the work performed
by the parser of a compiler
n Parser checks syntax, verifies relations
n Parse tree representation
n This is then translated into RA expression

© Prof. Navneet Goyal, BITS, Pilani


Steps in Query
Processing
n Query Execution Plan
n In SQL, a query can be expressed is several ways
n Each SQL query can itself be translated into RA expression
in many ways
n An RA expression only partially tells you how to evaluate a
query
n Several ways to evaluate RA expression
n Annotate RA expression with instructions specifying how to
evaluate each operation
n Annotation may state the algorithm to be used for a specific
operation or the particular index to use
n Annotated RAE is called an evaluation primitive
n Sequence of primitive operations is a QEP
© Prof. Navneet Goyal, BITS, Pilani
Steps in Query
Processing
n Evaluation
n The query-execution engine takes a query-
evaluation plan, executes that plan, and returns
the answers to the query.

© Prof. Navneet Goyal, BITS, Pilani


Steps in Query
Processing
n Example
select balance
from account
where balance < 2500
n RAEs
n σbalance<2500(∏balance(account))
n ∏balance(σbalance<2500(account))
n E.g., we can use an index on balance to find
accounts with balance < 2500,
n or can perform complete relation scan and
discard accounts with balance ≥ 2500

© Prof. Navneet Goyal, BITS, Pilani


Steps in Query
Processing

Query Execution Plan


© Prof. Navneet Goyal, BITS, Pilani
Query Optimization
n Different QEPs for a given query can
have different costs
n Users not expected to write their
queries in a way that suggests the most
efficient QEP
n It is the system’s responsibility to
construct a QEP that minimizes the cost
n This is Query Optimization

© Prof. Navneet Goyal, BITS, Pilani


Query Optimization
n For optimizing a query, the Query
Optimizer must know the cost of each
operation
n Cost is hard to compute
n Depends on many parameters such as
actual memory available to the
operation
n Systems work with rough estimates

© Prof. Navneet Goyal, BITS, Pilani


Query Optimization
Ideally: Want to find best plan.
Practically: Avoid worst plans!

© Prof. Navneet Goyal, BITS, Pilani


Overall System Structure

Figure taken from Silberschatz: Databse


System Concepts, 5e, McGraw Hill
Q&A
Thank You

Das könnte Ihnen auch gefallen