Sie sind auf Seite 1von 59

Chapter 23

Query Processing

Pearson Education 2009

Chapter 23 - Objectives
Objectives of query processing and optimization. Static versus dynamic query optimization. How a query is decomposed and semantically analyzed. How to create a R.A.T. to represent a query. Rules of equivalence for RA (relation algebra) operations. How to apply heuristic transformation rules to improve efficiency of a query.

Pearson Education 2009

Chapter 23 - Objectives
Types of database statistics required to estimate cost of operations. Different strategies for implementing selection. How to evaluate cost and size of selection. Different strategies for implementing join. How to evaluate cost and size of join. Different strategies for implementing projection. How to evaluate cost and size of projection.

Pearson Education 2009

Chapter 23 - Objectives
How to evaluate the cost and size of other RA operations. How pipelining can be used to improve efficiency of queries. Difference between materialization and pipelining. Advantages of left-deep trees. Approaches to finding optimal execution strategy. How Oracle handles QO.

Pearson Education 2009

Introduction
In

network and hierarchical DBMSs, low-level procedural query language is generally embedded in high-level programming language. Programmers responsibility to select most appropriate execution strategy. With declarative languages such as SQL, user specifies what data is required rather than how it is to be retrieved. Relieves user of knowing what constitutes good execution strategy.
Pearson Education 2009

Introduction
Two

main techniques for query optimization: heuristic rules that order operations in a query; comparing different strategies based on relative costs, and selecting one that minimizes resource usage.

Disk

access tends to be dominant cost in query processing for centralized DBMS.

Pearson Education 2009

Query Processing
Activities involved in retrieving data from the database.
Aims

of QP: transform query written in high-level language (e.g. SQL), into correct and efficient execution strategy expressed in low-level language (implementing RA); execute strategy to retrieve required data.
7

Pearson Education 2009

Query Optimization
Activity of choosing an efficient execution strategy for processing query.
As

there are many equivalent transformations of same high-level query, aim of QO is to choose one that minimizes resource usage. Generally, reduce total execution time of query. May also reduce response time of query. Problem computationally intractable with large number of relations, so strategy adopted is reduced to finding near optimum solution.
Pearson Education 2009

Example 23.1 - Different Strategies


Find all Managers who work at a London branch.

SELECT * FROM Staff s, Branch b WHERE s.branchNo = b.branchNo AND (s.position = Manager AND b.city = London);

Pearson Education 2009

Example 23.1 - Different Strategies


Three equivalent RA queries are: (1) (position='Manager') (city='London') (Staff.branchNo=Branch.branchNo) (Staff X Branch) (2) (position='Manager') (city='London')( Staff Staff.branchNo=Branch.branchNo Branch) (3) (position='Manager'(Staff)) Staff.branchNo=Branch.branchNo

(city='London' (Branch))

Pearson Education 2009

10

Example 23.1 - Different Strategies

Assume: 1000 tuples in Staff; 50 tuples in Branch; 50 Managers; 5 London branches; no indexes or sort keys; results of any intermediate operations stored on disk; cost of the final write is ignored; tuples are accessed one at a time.
11

Pearson Education 2009

Example 23.1 - Cost Comparison

Cost (in disk accesses) are:

(1) (1000 + 50) + 2*(1000 * 50) = 101 050 (2) 2*1000 + (1000 + 50) = 3 050 (3) 1000 + 2*50 + 5 + (50 + 5) = 1 160
Cartesian

product and join operations much more expensive than selection, and third option significantly reduces size of relations being joined together.

Pearson Education 2009

12

Phases of Query Processing


QP

has four main phases:

decomposition (consisting of parsing and validation); optimization; code generation; execution.

Pearson Education 2009

13

Phases of Query Processing

14
Pearson Education 2009

Dynamic versus Static Optimization


Two

times when first three phases of QP can be carried out: dynamically every time query is run; statically when query is first submitted. Advantages of dynamic QO arise from fact that information is up to date. Disadvantages are that performance of query is affected, time may limit finding optimum strategy.

Pearson Education 2009

15

Dynamic versus Static Optimization


Advantages of static QO are removal of runtime overhead, and more time to find optimum strategy. Disadvantages arise from fact that chosen execution strategy may no longer be optimal when query is run. Could use a hybrid approach to overcome this.

Pearson Education 2009

16

Query Decomposition
Aims are to transform high-level query into RA query and check that query is syntactically and semantically correct. Typical stages are: analysis, normalization, semantic analysis, simplification, query restructuring.

17

Pearson Education 2009

Analysis
query lexically (t vng) and syntactically using compiler techniques. Verify relations and attributes exist. Verify operations are appropriate for object type.
Analyze

Pearson Education 2009

18

Analysis - Example
SELECT staff_no FROM Staff WHERE position > 10;
This

query would be rejected on two grounds: staff_no is not defined for Staff relation (should be staffNo). Comparison >10 is incompatible with type position, which is variable character string.
19

Pearson Education 2009

Analysis
Finally,

query transformed into some internal representation more suitable for processing. Some kind of query tree is typically chosen, constructed as follows: Leaf node created for each base relation. Non-leaf node created for each intermediate relation produced by RA operation. Root of tree represents query result. Sequence is directed from leaves to root.
20

Pearson Education 2009

Example 23.1 - R.A.T.

Pearson Education 2009

21

Normalization
Converts query into a normalized form for easier manipulation. Predicate can be converted into one of two forms:

Conjunctive normal form:


(position = 'Manager' salary > 20000) (branchNo = 'B003')

Disjunctive normal form:


(position = 'Manager' branchNo = 'B003' ) (salary > 20000 branchNo = 'B003')

Pearson Education 2009

22

Semantic Analysis
Rejects

normalized queries that are incorrectly formulated or contradictory. Query is incorrectly formulated if components do not contribute to generation of result. Query is contradictory if its predicate cannot be satisfied by any tuple. Algorithms to determine correctness exist only for queries that do not contain disjunction and negation.

Pearson Education 2009

23

Semantic Analysis
For

these queries, could construct: A relation connection graph. Normalized attribute connection graph.

Relation connection graph Create node for each relation and node for result. Create edges between two nodes that represent a join, and edges between nodes that represent projection. If not connected, query is incorrectly formulated.
Pearson Education 2009

24

Simplification
Detects redundant qualifications, eliminates common sub-expressions, transforms query to semantically equivalent but more easily and efficiently computed form. Typically, access restrictions, view definitions, and integrity constraints are considered. Assuming user has appropriate access privileges, first apply well-known idempotency rules of boolean algebra.

Pearson Education 2009

25

Transformation Rules for RA Operations


Conjunctive Selection operations can cascade into individual Selection operations (and vice versa).

pqr(R) = p(q(r(R)))
Sometimes

referred to as cascade of Selection. branchNo='B003' salary>15000(Staff) = branchNo='B003'(salary>15000(Staff))

Pearson Education 2009

26

Transformation Rules for RA Operations


Commutativity of Selection. p(q(R)) = q(p(R))

For example:

branchNo='B003'(salary>15000(Staff)) = salary>15000(branchNo='B003'(Staff))

Pearson Education 2009

27

Transformation Rules for RA Operations


In a sequence of Projection operations, only the last in the sequence is required. LM N(R) = L (R)
For

example: lNamebranchNo, lName(Staff) = lName (Staff)

Pearson Education 2009

28

Transformation Rules for RA Operations


Commutativity of Selection and Projection.

If predicate p involves only attributes in projection list, Selection and Projection operations commute: Ai, , Am(p(R)) = p(Ai, , Am(R)) where p {A1, A2, , Am} For example: fName, lName(lName='Beech'(Staff)) = lName='Beech'(fName,lName(Staff))

Pearson Education 2009

29

Transformation Rules for RA Operations


Commutativity of Theta join (and Cartesian product). R pS=S pR RXS=SXR
Rule

also applies to Equijoin and Natural join. For example: Staff staff.branchNo=branch.branchNo Branch = Branch
staff.branchNo=branch.branchNo Staff

Pearson Education 2009

30

Transformation Rules for RA Operations


Commutativity of Selection and Theta join (or Cartesian product).
If

selection predicate involves only attributes of one of join relations, Selection and Join (or Cartesian product) operations commute: p(R r S) = (p(R)) rS p(R X S) = (p(R)) X S

where p {A1, A2, , An}


Pearson Education 2009

31

Transformation Rules for RA Operations


If

selection predicate is conjunctive predicate having form (p q), where p only involves attributes of R, and q only attributes of S, Selection and Theta join operations commute as: p q(R
r

S) = (p(R))

(q(S))

p q(R X S) = (p(R)) X (q(S))

Pearson Education 2009

32

Transformation Rules for RA Operations


For

example:

position='Manager' city='London'(Staff Staff.branchNo=Branch.branchNo Branch) = (position='Manager'(Staff)) Staff.branchNo=Branch.branchNo (city='London' (Branch))

Pearson Education 2009

33

Transformation Rules for RA Operations


Commutativity of Projection and Theta join (or Cartesian product).
If

projection list is of form L = L1 L2, where L1 only has attributes of R, and L2 only has attributes of S, provided join condition only contains attributes of L, Projection and Theta join commute:
L1L2(R
r

S) = (L1(R))

(L2(S))
34

Pearson Education 2009

Transformation Rules for RA Operations


If

join condition contains additional attributes not in L (M = M1 M2 where M1 only has attributes of R, and M2 only has attributes of S), a final projection operation is required:
L1L2(R r S) = L1L2( (L1M1(R)) (L2M2(S)))
r

Pearson Education 2009

35

Transformation Rules for RA Operations


For

example:
Staff.branchNo=Branch.branchNo

position,city,branchNo(Staff = (position, branchNo(Staff)) city, branchNo (Branch))


and

Branch)

Staff.branchNo=Branch.branchNo

using the latter rule: position, city(Staff Staff.branchNo=Branch.branchNo Branch) = position, city ((position, branchNo(Staff)) Staff.branchNo=Branch.branchNo ( city, branchNo (Branch)))
36

Pearson Education 2009

Transformation Rules for RA Operations


Commutativity of Union and Intersection (but not set difference). RS=SR RS=SR

Pearson Education 2009

37

Transformation Rules for RA Operations


Commutativity of Selection and set operations (Union, Intersection, and Set difference). p(R S) = p(S) p(R) p(R S) = p(S) p(R)

p(R - S) = p(S) - p(R)

Pearson Education 2009

38

Transformation Rules for RA Operations


Commutativity of Projection and Union.

L(R S) = L(S) L(R)


Associativity of Union and Intersection (but not Set difference). (R S) T = S (R T)

(R S) T = S (R T)

Pearson Education 2009

39

Transformation Rules for RA Operations


Associativity of Theta join (and Cartesian product). Cartesian product and Natural join are always associative: (R S) T = R (S T) (R X S) X T = R X (S X T)
If

join condition q involves attributes only from S and T, then Theta join is associative: (R p S) qr T = R p r (S q T)
Pearson Education 2009

40

Transformation Rules for RA Operations

For example:
Staff.staffNo=PropertyForRent.staffNo PropertyForRent)

(Staff

ownerNo=Owner.ownerNo staff.lName=Owner.lName

Owner =

Staff staff.staffNo=PropertyForRent.staffNo staff.lName=lName (PropertyForRent ownerNo Owner)

Pearson Education 2009

41

Example 23.3 Use of Transformation Rules


For prospective renters of flats, find properties that match requirements and owned by CO93.
SELECT p.propertyNo, p.street FROM Client c, Viewing v, PropertyForRent p WHERE c.prefType = Flat AND c.clientNo = v.clientNo AND v.propertyNo = p.propertyNo AND c.maxRent >= p.rent AND c.prefType = p.type AND p.ownerNo = CO93;
Pearson Education 2009

42

Example 23.3 Use of Transformation Rules

Pearson Education 2009

43

Example 23.3 Use of Transformation Rules

Pearson Education 2009

44

Example 23.3 Use of Transformation Rules

45
Pearson Education 2009

Heuristical Processing Strategies

Perform Selection operations as early as possible. Keep predicates on same relation together.

Combine

Cartesian product with subsequent Selection whose predicate represents join condition into a Join operation.
associativity of binary operations to rearrange leaf nodes so leaf nodes with most restrictive Selection operations executed first.

Use

Pearson Education 2009

46

Heuristical Processing Strategies


Perform

Projection as early as possible. common expressions once.

Keep projection attributes on same relation together.


Compute

If common expression appears more than once, and result not too large, store result and reuse it when required. Useful when querying views, as same expression is used to construct view each time.

Pearson Education 2009

47

Cost Estimation for RA Operations


Many

different ways of implementing RA operations. Aim of QO is to choose most efficient one. Use formulae that estimate costs for a number of options, and select one with lowest cost. Consider only cost of disk access, which is usually dominant cost in QP. Many estimates are based on cardinality of the relation, so need to be able to estimate this.

Pearson Education 2009

48

Database Statistics
Success

of estimation depends on amount and currency of statistical information DBMS holds. Keeping statistics current can be problematic. If statistics updated every time tuple is changed, this would impact performance. DBMS could update statistics on a periodic basis, for example nightly, or whenever the system is idle.

Pearson Education 2009

49

Query Optimization in Oracle


Oracle

supports two approaches to query optimization: rule-based and cost-based.

Rule-based 15 rules, ranked in order of efficiency. Particular access path for a table only chosen if statement contains a predicate or other construct that makes that access path available. Score assigned to each execution strategy using these rankings and strategy with best (lowest) score selected.
Pearson Education 2009

50

QO in Oracle Rule-Based
When

2 strategies have same score, tie-break resolved by making decision based on order in which tables occur in the SQL statement.

Pearson Education 2009

51

QO in Oracle Rule-based: Example


SELECT propertyNo FROM PropertyForRent WHERE rooms > 7 AND city = London Single-column access path using index on city from WHERE condition (city = London). Rank 9. Unbounded range scan using index on rooms from WHERE condition (rooms > 7). Rank 11. Full table scan - rank 15. Although there is index on propertyNo, column does not appear in WHERE clause and so is not considered by optimizer. Based on these paths, rule-based optimizer will choose to use index based on city column. 52
Pearson Education 2009

QO in Oracle Cost-Based
To

improve QO, Oracle introduced cost-based optimizer in Oracle 7, which selects strategy that requires minimal resource use necessary to process all rows accessed by query (avoiding above tie-break anomaly). User can select whether minimal resource usage is based on throughput or based on response time, by setting the OPTIMIZER_MODE initialization parameter. Cost-based optimizer also takes into consideration hints that the user may provide.
Pearson Education 2009

53

QO in Oracle Statistics
Cost-based

optimizer depends on statistics for all tables, clusters, and indexes accessed by query. Users responsibility to generate these statistics and keep them current. Package DBMS_STATS can be used to generate and manage statistics. Whenever possible, Oracle uses a parallel method to gather statistics, although index statistics are collected serially.

Pearson Education 2009

54

QO in Oracle Histograms
Previously

made within columns distributed. Histogram of frequencies gives estimates in distribution.

assumption that data values of a table are uniformly values and their relative optimizer improved selectivity presence of non-uniform

Pearson Education 2009

55

QO in Oracle Histograms

(a) uniform distribution of rooms; (b) actual non-uniform distribution. (a) can be stored compactly as low value (1) and high value (10), and as total count of all frequencies (in this case, 100).
Pearson Education 2009

56

QO in Oracle Histograms
Histogram

is data structure that can improve estimates of number of tuples in result. Two types of histogram:
width-balanced histogram, which divides data into a fixed number of equal-width ranges (called buckets) each containing count of number of values falling within that bucket; height-balanced histogram, which places approximately same number of values in each bucket so that end points of each bucket are determined by how many values are in that bucket.
57

Pearson Education 2009

QO in Oracle Histograms

(a) width-balanced for rooms with 5 buckets. Each bucket of equal width with 2 values (1-2, 3-4, etc.) (b) height-balanced height of each column is 20 (100/5).
Pearson Education 2009

58

QO in Oracle Viewing Execution Plan

Pearson Education 2009

59

Das könnte Ihnen auch gefallen