Sie sind auf Seite 1von 52

Database

Shared collection of logically related data (and a description of this data), designed to meet the information needs of an organization.

Database Management System (DBMS)


A software system that enables users to define, create, maintain, and control access to the database. (Database) application program: a computer program that interacts with database by issuing an appropriate request (SQL statement) to the DBMS.

Components of a DBMS

Database System Development Lifecycle (SDLC)

Stages

Main Activities

Database Planning System Definition Requirement collection and analysis Database design DBMS selection Application design Prototyping Implementation Data conversion and loading Testing

Planning how the stages of the lifecycle can be realized most efficiently and effectively. Specifying the scope and boundaries of database system. Collection and analysis of the requirements for the new database system.

Conceptual, logical and physical design of the database. Selecting a suitable DBMS for the database system. Design the user interface and the application program that use and process the database. Building a working model of the database system. Creating the physical database definitions and the application programs. Loading data from old system to the new system.

Database system is tested for errors and validated against the requirements specified by the user. Database system is fully implemented and continuously monitored.

Operational maintenance

Parallel activities in the DBLC and SDLC

Why Database Design is Important?


Database design focuses on design of database structure used for end-user data. Designer must identify databases expected use

Well-designed database:
Facilitates data management Generates accurate and valuable information

Poorly designed database:


Causes difficult-to-trace errors Process of creating a design for a database that will support the enterprises mission statement and mission objectives for the required database system. The design is made up of three (3) main phases: conceptual, logical, and physical.

Database Model
Main purposes of data modeling include: to assist in understanding the meaning (semantics) of the data; to facilitate communication about the information requirements.

Database Approach

SQL often referred to as Structured Query Language


Database computer language designed for managing data in relational database management systems (RDBMS) Originally based upon relational algebra that includes data query and update, schema creation and modification, and data access control.

Data Definition Language (DDL)


Allows the DBA or user to create and name entities, attributes, and relationships required for the application plus security constraints.

Data Manipulation Language (DML)


Provides basic data manipulation operations on data held in the database.

Procedural DML : allows user to tell system exactly how to manipulate data. Non-Procedural DML : allows user to state what data is needed rather than how it is to be
retrieved.

database language should allow user to:


create the database and relation structures; perform insertion, modification, deletion of data from relations; perform simple and complex queries.

SQL is a transform-oriented language with 2 major components:


A DDL for defining database structure. A DML for retrieving and updating data.

SQL is relatively easy to learn:


it is non-procedural - you specify what information you require, rather than how to get it.

SQL: Data Manipulation Language (DML)


use DreamHome SELECT Statement SELECT [DISTINCT | ALL] {* | [columnExpression [AS newName]] [,...] } FROM TableName [alias] [, ...] [WHERE condition] [GROUP BY columnList] [HAVINGcondition] [ORDER BY columnList]

FROM WHERE GROUP BY HAVING

Specifies table(s) to be used. Filters rows. Forms groups of rows with same column value. Filters groups subject to some condition. SELECT to appear in output. ORDER BY Specifies the order of the output. - Order of the clauses cannot be changed. - Only SELECT and FROM are mandatory.

Specifies which columns are

List full details of all staff. SELECT * FROM Staff; List the property numbers of all properties that have been viewed.Use DISTINCT to eliminate duplicates SELECT distinct propertyNo FROM Viewing; Produce list of monthly salaries for all staff, showing staff number, first/last name, and salary. SELECT staffNo, fName, lName, salary/12 as haider FROM Staff; List all staff with a salary greater than 12,000. SELECT * FROM Staff WHERE salary > 12000; List addresses of all branch offices in London or Glasgow. SELECT * FROM Branch WHERE city = 'London' OR city = 'Glasgow'; List all staff with a salary between 20,000 and 30,000. SELECT * FROM Staff WHERE salary BETWEEN 20000 AND 30000; SELECT * FROM Staff WHERE salary>=20000 AND salary <= 30000;

List all managers and supervisors. SELECT * FROM Staff WHERE position IN ('Manager', 'Supervisor'); SELECT * FROM Staff WHERE position='Manager' OR position='Supervisor'; Find all owners with the string Glasgow in their address. SELECT * FROM PrivateOwner WHERE address LIKE '%Glasgow%'; TO GOOD ANSWER MUST BE CHOOSE LIKE WITH CONDITION CITY OR ADDRESS List details of all viewings on property PG4 where a comment has not been supplied. SELECT * FROM Viewing WHERE propertyNo = 'PG4' AND comment IS not NULL; List salaries for all staff, arranged in descending order of salary. SELECT * FROM Staff ORDER BY salary desc; Produce abbreviated list of properties in order of property type. SELECT * FROM PropertyForRent ORDER BY type,rooms; Four flats in this list - as no minor sort key specified, system arranges these rows in any order it chooses. SELECT * FROM PropertyForRent ORDER BY type desc , rent DESC;

Aggregates (count + avg+sum+max+min) Can use DISTINCT before column name to eliminate duplicates. DISTINCT has no effect with MIN/MAX, but may have with SUM/AVG.

SELECT COUNT(salary)FROM Staff; How many properties cost more than 350 per month to rent? SELECT COUNT(*) AS myCount FROM PropertyForRent WHERE rent > 350; How many different properties viewed in May 04? SELECT COUNT(DISTINCT propertyNo) AS myCount FROM Viewing WHERE viewDate BETWEEN '1-May-01' AND '31-May-01';

Find number of Managers and sum of their salaries. SELECT COUNT(staffNo) AS myCount,SUM(salary) AS mySum FROM Staff WHERE position = 'Manager'; Find minimum, maximum, and average staff salary. SELECT MIN(salary) AS myMin,MAX(salary) AS myMax,AVG(salary) AS myAvg FROM Staff; Find number of staff in each branch and their total salaries. SELECT branchNo,COUNT(staffNo) AS CountSTAFF,SUM(salary) AS SumSTAFF FROM Staff GROUP BY branchNo ORDER BY branchNo desc; For each branch with more than 1 member of staff, find number of staff in each branch and sum of their salaries. SELECT branchNo, COUNT(staffNo) AS myCount,SUM(salary) AS mySum FROM Staff GROUP BY branchNo HAVING COUNT(staffNo) > 1 ORDER BY branchNo; List staff who work in branch at 163 Main St. SELECT * FROM Staff WHERE branchNo=(SELECT branchNo FROM Branch WHERE street='163 Main Street'); SELECT * FROM Staff WHERE branchNo = 'B003'; List all staff whose salary is greater than the average salary, and show by how much. SELECT * FROM Staff WHERE salary >(SELECT AVG(salary) FROM Staff); SELECT staffNo,fName,lName,position,salary,(SELECT AVG(salary) FROM Staff) As SalDiff FROM Staff WHERE salary > (SELECT AVG(salary) FROM Staff); Instead, use subquery to find average salary (17000), and then use outer SELECT to find those staff with salary greater than this: SELECT staffNo,fName,lName,position,salary,17000 As salDiff FROM Staff WHERE salary > 17000; List properties handled by staff at 163 Main St. SELECT * FROM PropertyForRent WHERE staffNo IN (SELECT staffNo FROM Staff WHERE branchNo = (SELECT branchNo FROM Branch WHERE street = '163 Main Street')); SELECT * FROM PropertyForRent WHERE branchNo = (SELECT branchNo FROM Branch WHERE street = '163 Main Street');

Find staff whose salary is larger than salary of at least one member of staff at branch B003. SELECT * FROM Staff WHERE salary > SOME (SELECT salary FROM Staff WHERE branchNo = 'B003'); Find staff whose salary is larger than salary of every member of staff at branch B003. SELECT * FROM Staff WHERE salary > ALL (SELECT salary FROM Staff WHERE branchNo = 'B003'); List names of all clients who have viewed a property along with any comment supplied. SELECT c.clientNo,fName,propertyNo,comment FROM Client c, Viewing v WHERE c.clientNo = v.clientNo; For each branch, list numbers and names of staff who manage properties, and properties they manage. SELECT s.branchNo,s.staffNo,fName,lName,p.propertyNo FROM Staff s, PropertyForRent p WHERE s.staffNo = p.staffNo ORDER BY s.branchNo, s.staffNo, propertyNo;

For each branch, list staff who manage properties, including city in which branch is located and properties they manage. SELECT b.branchNo,b.city,s.staffNo,fName,lName,propertyNo FROM Branch b,Staff s,PropertyForRent p WHERE b.branchNo = s.branchNo AND s.staffNo = p.staffNo ORDER BY b.branchNo, s.staffNo, propertyNo;

Find number of properties handled by each staff member. SELECT s.branchNo, s.staffNo, COUNT(*) AS myCount FROM Staff s, PropertyForRent p WHERE s.staffNo = p.staffNo GROUP BY s.branchNo, s.staffNo ORDER BY s.branchNo, s.staffNo;

The (inner) join of these two tables: SELECT b.*, p.* FROM Branch b, PropertyForRent p WHERE b.City = p.City; List branches and properties that are in same city along with any unmatched branches (left table).

SELECT b.*, p.* FROM Branch b LEFT JOIN PropertyForRent p ON b.City = p.City; List branches and properties that are in same city along with any unmatched branches (right table). SELECT b.*, p.* FROM Branch b right JOIN PropertyForRent p ON b.City = p.City; List branches and properties in same city and any unmatched branches or properties. SELECT b.*, p.* FROM Branch b FULL JOIN PropertyForRent p ON b.City = p.City; Find all staff who work in a London branch. SELECT staffNo, fName, lName, position FROM Staff s WHERE EXISTS (SELECT * FROM Branch b WHERE s.branchNo = b.branchNo AND city = 'London'); SELECT staffNo, fName, lName, position FROM Staff s, Branch b WHERE s.branchNo = b.branchNo AND city = 'London'; List all cities where there is either a branch office or a property. SELECT city FROM Branch WHERE (city IS NOT NULL) UNION SELECT city FROM PropertyForRent WHERE (city IS NOT NULL); List all cities where there is both a branch office and a property. SELECT city FROM Branch INTERSECT SELECT city FROM PropertyForRent; SELECT DISTINCT b.city FROM Branch b, PropertyForRent p WHERE b.city = p.city; SELECT DISTINCT city FROM Branch b WHERE EXISTS (SELECT * FROM PropertyForRent p WHERE p.city = b.city); List of all cities where there is a branch office but no properties. (SELECT city FROM Branch) EXCEPT (SELECT city FROM PropertyForRent); OR SELECT DISTINCT city FROM Branch WHERE city NOT IN (SELECT city FROM PropertyForRent); SELECT DISTINCT city FROM Branch b WHERE NOT EXISTS (SELECT * FROM PropertyForRent p WHERE p.city = b.city);

Column names in HAVING (group by) clause must also appear in the GROUP BY list or be contained within an aggregate function.

INSERT INTO TableName [ (columnList) ] VALUES (dataValueList)


a/INSERT INTO Staff VALUES (SG16, Alan, Brown, Assistant, M, 1957-05-25, 8300, B003);

b/INSERT INTO Staff (staffNo, fName, lName,position, salary, branchNo) VALUES (SG44, Anne, Jones,Assistant, 8100, B003); c/INSERT INTO Staff VALUES (SG44, Anne, Jones, Assistant, NULL, NULL, 8100, B003); How to insert information from tow table by using insert select INSERT INTO stafff (staffNo,fName,lName,postcode) (SELECT s.staffNo,fName,lName ,postcode FROM Staff s,PropertyForRent p WHERE s.staffNo = p.staffNo)

UPDATE TableName SET columnName1 = dataValue1 , columnName2 = dataValue2 [WHERE searchCondition] Give all staff a 3% pay increase. UPDATE Staff SET salary = salary*1.03 WHERE position != 'Manager';; Give all Managers a 5% pay increase. UPDATE Staff SET salary = salary*1.05 WHERE position = 'Manager'; Promote David Ford (staffNo=SG14) to Manager and change his salary to 22,000. UPDATE Staff SET position = 'Manager', salary = 22000 WHERE staffNo = 'SG14';

DELETE FROM TableName [WHERE searchCondition] Delete all viewings that relate to property PG4. DELETE FROM Viewing WHERE propertyNo = PG4; Delete all records from the Viewing table. DELETE FROM Viewing;

Requirement Collection and Analysis 1:- Fact-finding techniques


- It is critical to capture the necessary facts to build the required database application. - These facts are captured using fact-finding techniques. - The formal process of using techniques such as interviews and questionnaires to collect facts about systems, requirements, and preferences.

2:- When Are Fact-Finding Techniques Used?


Fact-finding used throughout the database application lifecycle. Crucial to the early stages including database planning, system definition, and requirements collection and analysis stages. Enables developer to learn about the problems, constraints, requirements, organization and the users of the system.

Fact-Finding Techniques A database developer normally uses several fact-finding techniques during a single database project including: - examining documentation -interviewing -observing the organization in operation -research -questionnaires Database Planning Overall purpose: -Analyze company situation -Define problems and constraints -Define objectives -Define scope and boundaries -Interactive and iterative processes required to complete first phase of DBLC successfully The Database Initial Study

Scope: Boundaries:

extent of design according to operational requirements limits external to system

Maintain % perform % to track the status ( query ) % report on any database Databases Design Methodology A structured approach that uses procedures, techniques, tools, and documentation to support and facilitate the process of design. Three main phases : Conceptual database design Logical database design Physical database design

Conceptual Database Design


The process of constructing a model of the data used in an enterprise, independent of all physical considerations.

Logical Database Design


The process of constructing a model of the data used in an enterprise based on a specific data model (e.g. relational), but independent of a particular DBMS and other physical considerations.

Physical Database Design


The process of producing a description of the implementation of the database on secondary storage . and Describes the base relations, file organizations, and indexes design used to achieve efficient access to the data, and any integrity constraints and security measures. Dependent of a particular DBMS and other physical considerations. Methodology in Database Design -Work interactively with the users as much as possible. -Follow a structured methodology throughout the data modeling process. -Employ a data-driven approach. -Use diagrams to represent as much of the data models as possible. -Use a Database Design Language (DBDL) to represent additional data semantics. -Build a data dictionary to supplement the data model diagrams.

Overview Database Design Methodology


Build Conceptual database design To build a conceptual data model of the data requirements of the enterprise. Model comprises entity types, relationship types, attributes and attribute domains, primary and alternate keys, and integrity constraints. Step 1.1 Identify entity types (To To identify the required entity types) types

Step 1.2 Identify relationship types ( To identify the important relationships that exist between the entity types )

Step 1.3 Identify and associate attributes with entity or relationship types

Step 1.4 Determine attribute domains (To To determine domains for the attributes in the data model and document document the details of each domain )

Step 1.5 Determine candidate, primary, and alternate key attributes (To identify the candidate key(s) for each entity and if there is more than one candidate key, to choose one to be the primary key and and the others as alternate keys )

Step 1.6

Check model for redundancy (To To check for the presence of any

redundancy in the model and and to remove any that does exist )

Example of a non-redundant non redundant relationship FatherOf

Step 1.7 1. Review conceptual data model with user

Build and validate Logical database design To translate the conceptual data model into a logical data model and then to validate this model to check that it is structurally correct using normalization and supports the required transactions.

How to derive deriv a set of relations from a conceptual data model. How to validate these relations using the technique of normalization. How to merge local logical data models based on one or more user views into a global logical data model that represents all user views. How to ensure that the final logical data model is a true and accurate representation of the data requirements of the enterprise. enterprise.

Step 2.1 Derive relations for logical data model (To To create relations for the logical data model to represent the entities, relationships, and attributes that have been identified )

( (1)

One-to-many many (1:*) binary relationship types

For each 1:* binary relationship, the entity on the one side of the relationship is designated as the parent entity and the entity on the many side is designated as the child entity. To represent this relationship, post a copy of the primary key attribute(s) attribute(s) of parent entity into the relation representing the child entity, to act as a foreign key. (2 2) One-to-one one (1:1) binary relationship types Creating relations to represent a 1:1 relationship is more complex as the cardinality cannot be used to identify identify the parent and child entities in a relationship. Instead, the participation constraints are used to decide whether it is best to represent the relationship by combining the entities involved into one relation or by creating two relations and posting a copy copy of the primary key from one relation to the other. (3 3) Superclass/subclass relationship types Identify superclass entity as parent entity and subclass entity as the child entity. There are various options on how to represent such a relationship as one or more relations.

(4 4)

Many-to-many Many many (*:*) binary relationship types Create a relation to represent represent the relationship and include any attributes that are part of the relationship. We post a copy of the primary key attribute(s) of the entities that participate in the relationship into the new relation, to act as foreign keys. These foreign keys will also also form the primary key of the new relation, possibly in combination with some of the attributes of the relationship.

Step 2.2 Validate relations using normalization ( To validate the relations in the logical data model using normalization ) Step 2.3 Validate relations against user transactions (To To ensure that the relations in the logical data model support support the required transactions ) Step 2.4 Define integrity constraints Step 2.5 Review logical data model with user (To To review the logical data model with the users to ensure that they consider the model to be a true representation of the data requirements of the enterprise ) Step 2.6 Check for future growth

Physical database design : How to map the logical database design to a physical database design. How to design base relations for target DBMS. How to design general constraints for target DBMS. How to select appropriate file organizations based on analysis of transactions. When to use secondary indexes to improve performance. How to estimate the size of the database. How to design user views. How to design security mechanisms to satisfy user requirements.

Logical vs Physical Database Design Sources of information for physical design process includes logical data model and documentation that describes model. Logical database design is concerned with the what, physical database design is concerned with the how.

Physical Database Design Process of producing a description of the implementation of the database on secondary storage. It describes the base relations, file organizations, and indexes used to achieve efficient access to the data, and any associated integrity constraints and security measures.

Translate logical data model for target DBMS ( Physical database design ) To produce a relational database schema that can be implemented in the target DBMS from the global logical data model. Need to know functionality of target DBMS such as how to create base relations and whether the system supports the definition of: PKs, FKs, and AKs; required data i.e. whether system supports NOT NULL domains; relational integrity constraints; enterprise constraints. Step 3.1 Design base relations (To decide how to represent base relations identified in global logical data model in target DBMS ) For each relation need to define: the name of the relation; a list of simple attributes in brackets;

the PK and, where appropriate, AKs and FKs. a list of any derived attributes and how they should be computed; referential integrity constraints for any FKs identified. For each attribute need to define: its domain, consisting of a data type, length, and any constraints on the domain; an optional default value for the attribute; whether the attribute can hold nulls.

Step 3.2

Design representation of derived data ( to decide how to

represent any derived data present in the global logical data model in the target DBMS. And Examine logical data model and data dictionary, and produce list of all derived attributes. Derived attribute can be stored in database or calculated every time it is needed. Option selected is based on: additional cost to store the derived data and keep it consistent with operational data from which it is derived; cost to calculate it each time it is required. Less expensive option is chosen subject to performance constraints.

Step 3.3 Design general constraints (To To design the enterprise constraints for the target DBMS ) CONSTRAINT StaffNotHandlingTooMuch CHECK (NOT EXISTS (SELECT staffNo FROM PropertyForRent GROUP BY staffNo HAVING COUNT(*) > 100)) Step 4 Design file organizations and indexes To determine optimal file organizations to store the base relations and the indexes that are required to achieve acceptable performance. The way in which relations and tuples will be held on secondary storage.

Records of a file must be allocated to disk blocks Typical block size is 2K or 2048 bytes A block may contain many records A file may be many blocks in length Relations (tables) are stored on disk with each tuple Written one after the other

Storage Level of Databases Level of storage Blocking Factor Fixed-length Fixed length records Variable-length Variable length records

Multiple LRs per PR (unspanned)

LR split across PRs (spanned)

PR containing LRs from different tables

Transferring physical records

File Structures/Organizations Selecting among alternative file structures is one of the most important choices in physical database design Common types of file structures are: - Unordered file (heap file) - Ordered file - Hash file - B+-tree - Cluster/Uncluster

Hash File Records do not have to be written sequentially to the file Hash function calculates the address of the page based on one or more of the fields in the record Examples ha.fu. techniques are Folding and Mod

Indexes A data structure that allows DBMS to locate particular records in a file more quickly. So, given a query condition(WHERE studno = 123456), we can look up the condition on the studno field In the index, and then go to the disk block where the rest of the data is stored and retrieve. The index are usually stored sorted by the indexing field Clustering index & Primary Index & Secondary (Dense) Index B+ - tree A balanced tree in which every path from the root of the tree to a leaf of the tree is of the same length

Sparse vs Dense Index #pointers in dense index = #pointers in sparse index * #records per block For large records dense and sparse indexes are about the same size Sparse index better for all updates and most queries IF query retrieves index attribute (eg. Count queries) only THEN use dense index ELSE use sparse index

Step 4 Design file organizations and indexes Number of factors that may be used to measure efficiency: - Transaction throughput: number of transactions processed in given time interval. - Response time: elapsed time for completion of a single transaction. - Disk storage: amount of disk space required to store database files. Step 4.1 Analyze transactions (To understand the functionality of the transactions that will run on the database and to analyze the important transactions ) & (Use this information to identify the parts of the database that may cause performance problems )&( Determine which relations are most frequently transactions) accessed by

Step 4.2

Choose file organization (To determine an efficient file

organization for each base relation where File organizations include Heap, Hash, Indexed Sequential Access Method (ISAM), B+-Tree, and Clusters ) Step 4.3 Choose indexes (To determine whether adding indexes will

improve the performance of the system )

Step 4.4 Estimate disk space requirements (To To estimate the amount of disk space that will be required by the database datab ) Step 5 Design user views To design the user views that were identified during the Requirements Collection and Analysis stage of the relational database application lifecycle. Step 6 Design security mechanisms To design the security measures for the database as specified by the users Step 7 Consider the introduction of controlled redundancy Step 8 Monitor and tune the operational system

Relational Algebra: Algebra


N Five basic operations in relational algebra: Selection, Projection, Cartesian product, Union, and Set Difference. N These perform most of the data retrieval (query) operations needed. N Also have Join, Intersection, and Division operations, which can be expressed expressed in terms of 5 basic operations.

Selection (or Restriction) Works on a single relation R and defines a relation that contains only those predicate (R) (Works tuples (rows) of R that satisfy the specified condition (predicate) (predicate ). Example Selection List all staff with with a salary greater than 10,000. 10,000 ( salary > 10000 (Staff) ) Projection

col1, . . . , coln(R) (Works Works on a single relation R and defines a relation that contains a vertical
subset of R, R, extracting the values of specified attributes and eliminating duplicates) duplicates Example Projection Produce a list of salaries for all staff, showing only staffNo, fName, ( staffNo, fName, lName, salary(Staff) )

lName, and salary details. Union

R S ( Union of two relations R and S defines a relation that contains all the tuples of R, or S, or both R and S, duplicate tuples being eliminated.R and S must be union-compatible union compatible ) Example Union List all cities where there is either a branch office or a property for rent. rent. ( city(Branch) city(PropertyForRent) )

Set Difference R S ( Defines a relation consisting of the tuples that are in relation R, but not in S.& R and S must be union-compatible ) Example - Set Difference List all cities where there is a branch office but no properties for rent. ( city(Branch) city(PropertyForRent) )

Intersection R S ( Defines a relation consisting of the set of all tuples that are in both R and S.& R and S must be union-compatible. Expressed using basic operations: R S = R (R S) ) Example Intersection property for rent. Cartesian product R X S ( Defines a relation that is the concatenation of every tuple of relation R with every tuple of relation S ) Example - Cartesian product List the names and comments of all clients who have viewed a List all cities where there is both a branch office and at least one

( city(Branch) city(PropertyForRent) )

property for rent. ( (clientNo, fName, lName(Client)) X (clientNo, propertyNo, comment (Viewing)) ) Example - Cartesian product and Selection where Client.clientNo = Viewing.clientNo. sClient.clientNo = Viewing.clientNo((clientNo, fName, lName(Client)) (clientNo, propertyNo, comment(Viewing))) Cartesian product and Selection can be reduced to a single operation called a Join. Use selection operation to extract those tuples

Join Operations:N Join is a derivative of Cartesian product. N Equivalent to performing a Selection, using join predicate as selection formula, over Cartesian product of the two operand relations. N One of the most difficult operations to implement efficiently in an RDBMS and one reason why RDBMSs have intrinsic performance problems.

N Various forms of join operation N Theta join N Equijoin (a particular type of Theta join) N Natural join N Outer join (left, right or full) N Semijoin

Query Processing:In network and hierarchical DBMSs, low-level procedural query language is generally embedded in high-level programming language. user specifies what data is required rather than how it is to be retrieved. Also gives DBMS more control over system performance. Two main techniques for query optimization: o heuristic rules that order operations in a query. o comparing different strategies based on relative costs Aims of QP: o transform query written in high-level language (e.g. SQL), into correct and efficient execution strategy expressed in low-level language (implementing RA); o execute strategy to retrieve required data.

Query Optimization:N As there are many equivalent transformations of same high-level query, aim of QO is to choose one that minimizes resource usage. N reduce total execution time of query and May also reduce response time of query. N Problem computationally intractable with large number of relations, so strategy adopted is reduced to finding near optimum solution.

Example:-

Find all Managers who work at a London branch.

SELECT * FROM Staff s, Branch b WHERE s.branchNo = b.branchNo AND (s.position = Manager AND b.city = London); N Assume: 1000 tuples in Staff; 50 tuples in Branch; 50 Managers; 5 London branches; no indexes or sort keys; results of any intermediate operations stored on disk; cost of the final write is ignored; tuples are accessed one at a time.

N Three equivalent RA queries are:


(1) (position='Manager') (city='London') Cost (in disk accesses) are: 1000 + 50 + 2*(1000 * 50) = 101 050 (2) (position='Manager') (city='London') ( Staff Cost (in disk accesses) are: 2*1000 + (1000 + 50) = 3 050 (3) (position='Manager'(Staff)) Cost (in disk accesses) are: 1000 + 2*50 + 5 + (50 + 5) = 1 160 Cartesian product and join operations much more expensive than selection, and third option significantly reduces size of relations being joined together.
Staff.branchNo=Branch.branchNo Staff.branchNo=Branch.branchNo (Staff.branchNo=Branch.branchNo) (Staff

X Branch)

Branch)

(city='London' (Branch))

Phases of Query Processing


N QP has four main phases: N decomposition N code generation optimization execution.

N Advantages of dynamic QO arise from fact that information is up-to-date. N Disadvantages are that performance of query is affected, time may limit finding optimum strategy. Query Decomposition N Aims are to transform high-level query into RA query and check that query is syntactically and semantically correct. N Typical stages are: Analysis N Analyze query lexically and syntactically using compiler techniques. N Verify relations and attributes exist. N Verify operations are appropriate for object type. Analysis Example SELECT staff_no FROM Staff WHERE position > 10; analysis, semantic analysis, query restructuring. normalization, simplification,

This query would be rejected on two grounds: staff_no is not defined for Staff relation (should be staffNo). Comparison >10 is incompatible with type position, which is variable character string.

Some kind of query tree is typically chosen, constructed as follows: Leaf node created for each base relation. Non-leaf node created for each intermediate relation produced by RA operation. Root of tree represents query result. Sequence is directed from leaves to root.

Normalization N Converts query into a normalized form for easier manipulation. N Predicate can be converted into one of two forms: Conjunctive normal form: ( position = 'Manager' salary > 20000) (branchNo = 'B003') Disjunctive normal form: ( position = 'Manager' branchNo = 'B003' ) (salary > 20000 branchNo = 'B003') Semantic Analysis SELECT p.propertyNo, p.street FROM Client c, Viewing v, PropertyForRent p WHERE c.clientNo = v.clientNo AND c.maxRent >= 500 AND c.prefType = 'Flat' AND p.ownerNo = 'CO93'; N Relation connection graph not fully connected, so query is not correctly formulated. N Have omitted the join condition (v.propertyNo = p.propertyNo) .

Relation Connection graph

Normalized attribute connection graph

SELECT p.propertyNo, p.street FROM Client c, Viewing v, PropertyForRent p WHERE c.maxRent > 500 AND c.clientNo = v.clientNo AND v.propertyNo = p.propertyNo AND c.prefType = 'Flat' AND c.maxRent < 200; N Normalized attribute connection graph has cycle between nodes c.maxRent and 0 with negative valuation sum, so query is contradictory.

Transformation Rules for RA Operations

pqr(R) = p(q(r(R))) p(q(R)) = q(p(R)) LM N(R) = L (R)

branchNo='B003' salary>15000(Staff) = branchNo='B003'(salary>15000(Staff))


branchNo='B003'(salary>15000(Staff)) = salary>15000(branchNo='B003'(Staff)) lNamebranchNo, lName(Staff) = lName (Staff)

Ai, , Am(p(R)) = p(Ai, , Am(R))

where p {A1, A2, , Am}

fName, lName(lName='Beech'(Staff)) = lName='Beech'(fName,lName(Staff))


R
p

S=S
Staff

RXS=SXR
Branch = Branch
staff.branchNo=branch.branchNo Staff

staff.branchNo=branch.branchNo

p(R p q(R

S) = (p(R))
r

S
r

p(R X S) = (p(R)) X S where p {A1, A2, , An}


(q(S))

S) = (p(R))

p q(R X S) = (p(R)) X (q(S))


Branch) =

position='Manager' city='London'(Staff
(position='Manager'(Staff))

Staff.branchNo=Branch.branchNo

Staff.branchNo=Branch.branchNo

(city='London' (Branch))

L1L2(R L1L2(R

r r

S) = (L1(R))

(L2(S))
r

S) = L1L2( (L1M1(R))

(L2M2(S))) Branch) = (position, branchNo(Staff))

position,city,branchNo(Staff
Staff.branchNo=Branch.branchNo

Staff.branchNo=Branch.branchNo

( city, branchNo (Branch)) Branch) = ( city, branchNo (Branch)))

position, city(Staff

Staff.branchNo=Branch.branchNo

position, city ((position, branchNo(Staff))


RS=SR RS=SR

Staff.branchNo=Branch.branchNo

p(R S) = p(S) p(R) L(R S) = L(S) L(R)


(R S) T = S (R T)

p(R S) = p(S) p(R)

p(R - S) = p(S) - p(R)

(R S) T = S (R T) (S T)
p r (S q T)

(R (R (Staff

S)
p S)

T=R
qr

(R X S) X T = R X (S X T)

T=R

Staff.staffNo=PropertyForRent.staffNo PropertyForRent)

ownerNo=Owner.ownerNo staff.lName=Owner.lName

Owner =
ownerNo

Staff

staff.staffNo=PropertyForRent.staffNo staff.lName=lName (PropertyForRent

Owner)

Example 20.3 Use of Transformation Rules For prospective renters of flats, find properties that match requirements and owned by CO93. SELECT p.propertyNo, p.street FROM Client c, Viewing v, PropertyForRent p WHERE c.prefType = 'Flat' AND v.propertyNo = p.propertyNo AND c.prefType = p.type AND AND AND c.clientNo = v.clientNo c.maxRent >= p.rent p.ownerNo = 'CO93';

Heuristical Processing Strategies 1:- Perform Selection operations as early as possible. 2:- Perform Projection as early as possible. 3:- Combine Cartesian product with subsequent Selection whose predicate represents join condition into a Join operation. Cost Estimation for RA Operations N Many different ways of implementing RA operations. N Aim of QO is to choose most efficient one. N Use formulae that estimate costs for a number of options, and select one with lowest cost. Typical Statistics for Relation R nTuples(R) - number of tuples in R (cardinality). bFactor(R) blocking factor of R (number of tuples fit into one block). nBlocks(R) - number of blocks required to store R: nBlocks(R) = [nTuples(R)/bFactor(R)]

nDistinctA(R) - number of distinct values that appear for attribute A in R. minA(R),maxA(R) - minimum and maximum possible values for attribute A in R. SCA(R) - selection cardinality of attribute A in R. Average number of tuples that satisfy an equality condition on attribute A. Statistics for Multilevel Index I on Attribute A SCA(R)=1, if A is a key attribute of R SCA(R)=nTuples(R)/nDistinctA(R), otherwise nLevelsA(I) - number of levels in I. nLfBlocksA(I) - number of leaf blocks in I.

Example: Cost Estimation for Selection N In Staff relation, assume: Hash index no overflow on PK staffNo Clustering index on FK branchNo B+-tree index on salary Statistics: nTuples(Staff)=3000, bFactor(Staff)=30, nDistinctbranchNo(Staff)=500, nDistinctsalary(Staff)=500, nLevelsbranchNo(I)=2, nLevelssalaray(I)=2,

nDistinctposition(Staff)=10, Minsalary(Staff)=10000, nLfBlockssalaray(I)=50

N Estimated cost of linear search on StaffNo is 50 blocks, and on a non-key attribute is 100 blocks. N S1: staffNo=SG5(Staff) -> Cost = 1 block, cardinality is SCstaffNo(Staff)=1 N S2: position=Manager(Staff) -> Cost = 100 blocks, cardinality is SCposition(Staff)=300 N S3: branchNo=B003(Staff) -> Cost = 2+[6/30]=3 blocks, cardinality is SCbranchNo(Staff)=6 N S4: salary > 20000(Staff) -> ? N S5: position=Manager ^ branchNo=B003(Staff) -> ?

Estimating Cardinality of Projection N When projection contains key, cardinality is: nTuples(S) = nTuples(R) N If projection consists of a single non-key attribute, estimate is: nTuples(S) = SCA(R) N Otherwise, could estimate cardinality as: nTuples(S) min(nTuples(R), im=1(nDistinctai(R)))

nTuples(S) nTuples(S) nDistinctB(R)

SCA(R)

For any attribute B A of S, nDistinctB(S) =

if nTuples(S) < nDistinctB(R)/2 if nTuples(S) > 2*nDistinctB(R)

[(nTuples(S) + nDistinctB(R))/3] otherwise

N Main strategies are: Linear Search (Unordered file, no index). For equality condition on key attribute, cost estimate is: [nBlocks(R)/2]

For any other condition, entire file may need to be searched, so more general cost estimate is: Binary Search (Ordered file, no index) If predicate is of form A = x, and file is ordered on key attribute A, cost estimate: Generally, cost estimate is: Equality on hash key. If there is no overflow, expected cost is 1 Equality condition on primary key. nLevelsA(I) + 1 Inequality condition on primary key. nLevelsA(I) + [nBlocks(R)/2] Equality condition on clustering (secondary) index. nLevelsA(I) + [SCA(R)/bFactor(R)] Equality condition on a non-clustering (secondary) index. nLevelsA(I) + [SCA(R)] Inequality condition on a secondary B+-tree index. nLevelsA(I) + [nLfBlocksA(I)/2 + nTuples(R)/2] [log2(nBlocks(R))] [log2(nBlocks(R))] + [SCA(R)/bFactor(R)] - 1 nBlocks(R)

Types of Trees

Query Optimization in MS SQL Server N SQL Server uses cost-based approaches to query optimization. N QO purpose is to determine the query execution plan with least amount of processing time. N The QO assigns a cost to every possible execution plan in term of CPU resource and disk I/O. N The execution plan with least associated cost is chooses for implementation. N SQL Server compile DML statement: N The query tree is then normalized and simplified. N The QO analyzes the different ways to access the source tables. N SQL Server reads and executes the optimized plan, returning the result set. N Optimization steps: N Query analysis to determine search arguments and join clauses.

N Row estimation and index selection based on search arguments and join clauses. N Join selection to determine the most appropriate order to access table. N Execution plan selection represent the most efficient solution.

DATABASE SECURITY
Data is a valuable resource that must be strictly controlled and managed, as with any corporate resource. Security considerations do not only apply to the data held in a database. Breaches of security may affect other parts of the system, which may in turn affect the database. N Involves measures to avoid: Theft and fraud Loss of confidentiality (secrecy) Loss of privacy Loss of integrity Loss of availability

Database Security Issues:- Three very brood issues in DB Security: 1. Legal and Ethical considerations 2. Policy issues 3. System level issues MANDATORY ACCESS CONTROL :- Used in government and other high-security environments. Also called: Multilevel Security(MLS). Object are given a security class or level, e.g.: - Top Secret (TS) - Confidential (C) - Secret (S) - Unclassified (U)

Principals

Securables

Permissions

Security in Microsoft Access DBMS N Provides two methods for securing a database: setting a password for opening a database (system security); user-level security, which can be used to limit the parts of the database that a user can read or update (data security).

Data Warehousing Concepts


A data warehouse is subject-oriented The warehouse is organized around the major subjects of the enterprise (e.g. customers, products, and sales) rather than the major application areas (e.g. customer invoicing, stock control, and product sales). collection of data in support of managements decision-making process . A copy of transaction data specially structure for query and analysis .

Benefits of Data Warehousing


Potential high returns on investment (ROI) Competitive advantage Increased productivity of corporate decision-makers

Problems of Data Warehousing


Hidden problems with source systems Increased end-user demands Required data not captured High maintenance

Warehouse Manager Performs all the operations associated with the management of
the data in the warehouse such as: Transformation and merging of source data from temporary storage into data warehouse tables. Creation of indexes and views on base tables.

Data Warehousing Tools and technologies ETL Processes Extraction Transformation Loading

Data Warehouse DBMS Requirements Load performance Load processing Data quality management Query performance Networked data warehouse Warehouse administration Integrated dimensional analysis

Data Warehouse Structure/Model

Interviews provide the necessary information for the top-down view (user requirements) and the bottom-up view (which data sources are available) of the data warehouse.

The database component of a data warehouse is described using a technique called dimensionality modeling (DM).

Dimensionality Modelling A logical design technique that aims to present the data in a standard, intuitive form that allows for high-performance access The data is describes the transactions in the fact table Every dimensional model (DM) is composed of one table with a composite primary key, called the fact table, and a set of smaller tables called dimension tables. Each dimension table has a simple (non-composite) primary key that corresponds exactly to one of the components of the composite key in the fact table. Forms star-like structure, which is called a Star Schema or Star Join. Predictable and standard form of the underlying dimensional model offers important advantages: Ability to handle changing requirements Ability to model common business situations Efficiency Extensibility Predictable query processing

Dimensionality Modeling STAR Schema Star schema is a logical structure that has a fact table (containing factual data) in the center, surrounded by denormalized dimension tables (containing reference data). Facts are generated by events that occurred in the past, and are unlikely to change, regardless of how they are analyzed. Dimensionality Modeling SNOWFLAKE Schema Snowflake schema is a variant of the star schema that has a fact table in the center, surrounded by normalized dimension tables. This also known as hierarchy of dimensions

Dimensionality Modeling STARFLAKE Schema Starflake schema is a hybrid structure that contains a mixture of star (denormalized) and snowflake (normalized) dimension tables.

Comparison of DM and ER models A single ER model normally

decomposes into multiple DMs. Multiple DMs are then associated through shared dimension tables.

Surrogate Key Online Analysis Processing(OLAP)

Data Warehousing includes Build Data Warehouse Online analysis processing(OLAP). Presentation (Reporting)