Lecture05 Query Processing Ch23

Chapter 23
Query Processing
Pearson Education 2009
Chapter 23 - Objectives
Objectives of query processing and optimization. Static versus dynamic query optimization. How a query is decomposed and semantically analyzed. How to create a R.A.T. to represent a query. Rules of equivalence for RA (relation algebra) operations. How to apply heuristic transformation rules to improve efficiency of a query.
Types of database statistics required to estimate cost of operations. Different strategies for implementing selection. How to evaluate cost and size of selection. Different strategies for implementing join. How to evaluate cost and size of join. Different strategies for implementing projection. How to evaluate cost and size of projection.
How to evaluate the cost and size of other RA operations. How pipelining can be used to improve efficiency of queries. Difference between materialization and pipelining. Advantages of left-deep trees. Approaches to finding optimal execution strategy. How Oracle handles QO.
Introduction
In
network and hierarchical DBMSs, low-level procedural query language is generally embedded in high-level programming language. Programmers responsibility to select most appropriate execution strategy. With declarative languages such as SQL, user specifies what data is required rather than how it is to be retrieved. Relieves user of knowing what constitutes good execution strategy.
Introduction
Two
main techniques for query optimization: heuristic rules that order operations in a query; comparing different strategies based on relative costs, and selecting one that minimizes resource usage.
Disk
access tends to be dominant cost in query processing for centralized DBMS.
Query Processing
Activities involved in retrieving data from the database.
Aims
of QP: transform query written in high-level language (e.g. SQL), into correct and efficient execution strategy expressed in low-level language (implementing RA); execute strategy to retrieve required data.
7
Query Optimization
Activity of choosing an efficient execution strategy for processing query.
As
there are many equivalent transformations of same high-level query, aim of QO is to choose one that minimizes resource usage. Generally, reduce total execution time of query. May also reduce response time of query. Problem computationally intractable with large number of relations, so strategy adopted is reduced to finding near optimum solution.
Example 23.1 - Different Strategies

Find all Managers who work at a London branch.
SELECT * FROM Staff s, Branch b WHERE s.branchNo = b.branchNo AND (s.position = Manager AND b.city = London);

Three equivalent RA queries are: (1) (position='Manager') (city='London') (Staff.branchNo=Branch.branchNo) (Staff X Branch) (2) (position='Manager') (city='London')( Staff Staff.branchNo=Branch.branchNo Branch) (3) (position='Manager'(Staff)) Staff.branchNo=Branch.branchNo
(city='London' (Branch))
10
Assume: 1000 tuples in Staff; 50 tuples in Branch; 50 Managers; 5 London branches; no indexes or sort keys; results of any intermediate operations stored on disk; cost of the final write is ignored; tuples are accessed one at a time.
11
Example 23.1 - Cost Comparison
Cost (in disk accesses) are:
(1) (1000 + 50) + 2*(1000 * 50) = 101 050 (2) 2*1000 + (1000 + 50) = 3 050 (3) 1000 + 2*50 + 5 + (50 + 5) = 1 160
Cartesian
product and join operations much more expensive than selection, and third option significantly reduces size of relations being joined together.
12
Phases of Query Processing

QP
has four main phases:
decomposition (consisting of parsing and validation); optimization; code generation; execution.
13
Phases of Query Processing
14
Dynamic versus Static Optimization

Two
times when first three phases of QP can be carried out: dynamically every time query is run; statically when query is first submitted. Advantages of dynamic QO arise from fact that information is up to date. Disadvantages are that performance of query is affected, time may limit finding optimum strategy.
15
Dynamic versus Static Optimization

Advantages of static QO are removal of runtime overhead, and more time to find optimum strategy. Disadvantages arise from fact that chosen execution strategy may no longer be optimal when query is run. Could use a hybrid approach to overcome this.
16
Query Decomposition
Aims are to transform high-level query into RA query and check that query is syntactically and semantically correct. Typical stages are: analysis, normalization, semantic analysis, simplification, query restructuring.
17
Analysis
query lexically (t vng) and syntactically using compiler techniques. Verify relations and attributes exist. Verify operations are appropriate for object type.
Analyze
18
Analysis - Example
SELECT staff_no FROM Staff WHERE position > 10;
This
query would be rejected on two grounds: staff_no is not defined for Staff relation (should be staffNo). Comparison >10 is incompatible with type position, which is variable character string.
19
Analysis
Finally,
query transformed into some internal representation more suitable for processing. Some kind of query tree is typically chosen, constructed as follows: Leaf node created for each base relation. Non-leaf node created for each intermediate relation produced by RA operation. Root of tree represents query result. Sequence is directed from leaves to root.
20
Example 23.1 - R.A.T.
21
Normalization
Converts query into a normalized form for easier manipulation. Predicate can be converted into one of two forms:
Conjunctive normal form:

(position = 'Manager' salary > 20000) (branchNo = 'B003')
Disjunctive normal form:

(position = 'Manager' branchNo = 'B003' ) (salary > 20000 branchNo = 'B003')
22
Semantic Analysis
Rejects
normalized queries that are incorrectly formulated or contradictory. Query is incorrectly formulated if components do not contribute to generation of result. Query is contradictory if its predicate cannot be satisfied by any tuple. Algorithms to determine correctness exist only for queries that do not contain disjunction and negation.
23
Semantic Analysis
For
these queries, could construct: A relation connection graph. Normalized attribute connection graph.
Relation connection graph Create node for each relation and node for result. Create edges between two nodes that represent a join, and edges between nodes that represent projection. If not connected, query is incorrectly formulated.
24
Simplification
Detects redundant qualifications, eliminates common sub-expressions, transforms query to semantically equivalent but more easily and efficiently computed form. Typically, access restrictions, view definitions, and integrity constraints are considered. Assuming user has appropriate access privileges, first apply well-known idempotency rules of boolean algebra.
25
Transformation Rules for RA Operations

Conjunctive Selection operations can cascade into individual Selection operations (and vice versa).
pqr(R) = p(q(r(R)))
Sometimes
referred to as cascade of Selection. branchNo='B003' salary>15000(Staff) = branchNo='B003'(salary>15000(Staff))
26

Commutativity of Selection. p(q(R)) = q(p(R))
For example:
branchNo='B003'(salary>15000(Staff)) = salary>15000(branchNo='B003'(Staff))
27

In a sequence of Projection operations, only the last in the sequence is required. LM N(R) = L (R)
For
example: lNamebranchNo, lName(Staff) = lName (Staff)
28

Commutativity of Selection and Projection.
If predicate p involves only attributes in projection list, Selection and Projection operations commute: Ai, , Am(p(R)) = p(Ai, , Am(R)) where p {A1, A2, , Am} For example: fName, lName(lName='Beech'(Staff)) = lName='Beech'(fName,lName(Staff))
29

Commutativity of Theta join (and Cartesian product). R pS=S pR RXS=SXR
Rule
also applies to Equijoin and Natural join. For example: Staff staff.branchNo=branch.branchNo Branch = Branch
staff.branchNo=branch.branchNo Staff
30

Commutativity of Selection and Theta join (or Cartesian product).
If
selection predicate involves only attributes of one of join relations, Selection and Join (or Cartesian product) operations commute: p(R r S) = (p(R)) rS p(R X S) = (p(R)) X S
where p {A1, A2, , An}

31

If
selection predicate is conjunctive predicate having form (p q), where p only involves attributes of R, and q only attributes of S, Selection and Theta join operations commute as: p q(R
r
S) = (p(R))
(q(S))
p q(R X S) = (p(R)) X (q(S))
32

For
example:
position='Manager' city='London'(Staff Staff.branchNo=Branch.branchNo Branch) = (position='Manager'(Staff)) Staff.branchNo=Branch.branchNo (city='London' (Branch))
33

Commutativity of Projection and Theta join (or Cartesian product).
If
projection list is of form L = L1 L2, where L1 only has attributes of R, and L2 only has attributes of S, provided join condition only contains attributes of L, Projection and Theta join commute:
L1L2(R
r
S) = (L1(R))
(L2(S))
34

If
join condition contains additional attributes not in L (M = M1 M2 where M1 only has attributes of R, and M2 only has attributes of S), a final projection operation is required:
L1L2(R r S) = L1L2( (L1M1(R)) (L2M2(S)))
r
35

For
example:
Staff.branchNo=Branch.branchNo
position,city,branchNo(Staff = (position, branchNo(Staff)) city, branchNo (Branch))

and
Branch)
Staff.branchNo=Branch.branchNo
using the latter rule: position, city(Staff Staff.branchNo=Branch.branchNo Branch) = position, city ((position, branchNo(Staff)) Staff.branchNo=Branch.branchNo ( city, branchNo (Branch)))
36

Commutativity of Union and Intersection (but not set difference). RS=SR RS=SR
37

Commutativity of Selection and set operations (Union, Intersection, and Set difference). p(R S) = p(S) p(R) p(R S) = p(S) p(R)
p(R - S) = p(S) - p(R)
38

Commutativity of Projection and Union.
L(R S) = L(S) L(R)

Associativity of Union and Intersection (but not Set difference). (R S) T = S (R T)
(R S) T = S (R T)
39

Associativity of Theta join (and Cartesian product). Cartesian product and Natural join are always associative: (R S) T = R (S T) (R X S) X T = R X (S X T)
If
join condition q involves attributes only from S and T, then Theta join is associative: (R p S) qr T = R p r (S q T)
40
For example:
Staff.staffNo=PropertyForRent.staffNo PropertyForRent)
(Staff
ownerNo=Owner.ownerNo staff.lName=Owner.lName
Owner =
Staff staff.staffNo=PropertyForRent.staffNo staff.lName=lName (PropertyForRent ownerNo Owner)
41
Example 23.3 Use of Transformation Rules

For prospective renters of flats, find properties that match requirements and owned by CO93.
SELECT p.propertyNo, p.street FROM Client c, Viewing v, PropertyForRent p WHERE c.prefType = Flat AND c.clientNo = v.clientNo AND v.propertyNo = p.propertyNo AND c.maxRent >= p.rent AND c.prefType = p.type AND p.ownerNo = CO93;
42
43
44
45
Heuristical Processing Strategies
Perform Selection operations as early as possible. Keep predicates on same relation together.
Combine
Cartesian product with subsequent Selection whose predicate represents join condition into a Join operation.
associativity of binary operations to rearrange leaf nodes so leaf nodes with most restrictive Selection operations executed first.
Use
46
Heuristical Processing Strategies

Perform
Projection as early as possible. common expressions once.
Keep projection attributes on same relation together.

Compute
If common expression appears more than once, and result not too large, store result and reuse it when required. Useful when querying views, as same expression is used to construct view each time.
47
Cost Estimation for RA Operations

Many
different ways of implementing RA operations. Aim of QO is to choose most efficient one. Use formulae that estimate costs for a number of options, and select one with lowest cost. Consider only cost of disk access, which is usually dominant cost in QP. Many estimates are based on cardinality of the relation, so need to be able to estimate this.
48
Database Statistics
Success
of estimation depends on amount and currency of statistical information DBMS holds. Keeping statistics current can be problematic. If statistics updated every time tuple is changed, this would impact performance. DBMS could update statistics on a periodic basis, for example nightly, or whenever the system is idle.
49
Query Optimization in Oracle

Oracle
supports two approaches to query optimization: rule-based and cost-based.
Rule-based 15 rules, ranked in order of efficiency. Particular access path for a table only chosen if statement contains a predicate or other construct that makes that access path available. Score assigned to each execution strategy using these rankings and strategy with best (lowest) score selected.
50
QO in Oracle Rule-Based
When
2 strategies have same score, tie-break resolved by making decision based on order in which tables occur in the SQL statement.
51
QO in Oracle Rule-based: Example

SELECT propertyNo FROM PropertyForRent WHERE rooms > 7 AND city = London Single-column access path using index on city from WHERE condition (city = London). Rank 9. Unbounded range scan using index on rooms from WHERE condition (rooms > 7). Rank 11. Full table scan - rank 15. Although there is index on propertyNo, column does not appear in WHERE clause and so is not considered by optimizer. Based on these paths, rule-based optimizer will choose to use index based on city column. 52
QO in Oracle Cost-Based
To
improve QO, Oracle introduced cost-based optimizer in Oracle 7, which selects strategy that requires minimal resource use necessary to process all rows accessed by query (avoiding above tie-break anomaly). User can select whether minimal resource usage is based on throughput or based on response time, by setting the OPTIMIZER_MODE initialization parameter. Cost-based optimizer also takes into consideration hints that the user may provide.
53
QO in Oracle Statistics
Cost-based
optimizer depends on statistics for all tables, clusters, and indexes accessed by query. Users responsibility to generate these statistics and keep them current. Package DBMS_STATS can be used to generate and manage statistics. Whenever possible, Oracle uses a parallel method to gather statistics, although index statistics are collected serially.
54
QO in Oracle Histograms
Previously
made within columns distributed. Histogram of frequencies gives estimates in distribution.
assumption that data values of a table are uniformly values and their relative optimizer improved selectivity presence of non-uniform
55
(a) uniform distribution of rooms; (b) actual non-uniform distribution. (a) can be stored compactly as low value (1) and high value (10), and as total count of all frequencies (in this case, 100).
56
Histogram
is data structure that can improve estimates of number of tuples in result. Two types of histogram:
width-balanced histogram, which divides data into a fixed number of equal-width ranges (called buckets) each containing count of number of values falling within that bucket; height-balanced histogram, which places approximately same number of values in each bucket so that end points of each bucket are determined by how many values are in that bucket.
57
(a) width-balanced for rooms with 5 buckets. Each bucket of equal width with 2 values (1-2, 3-4, etc.) (b) height-balanced height of each column is 20 (100/5).
58
QO in Oracle Viewing Execution Plan
59

Lecture05 Query Processing Ch23

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Lecture05 Query Processing Ch23

Hochgeladen von

Copyright:

Verfügbare Formate

Chapter 23

Pearson Education 2009

Pearson Education 2009

Pearson Education 2009

Pearson Education 2009

access tends to be dominant cost in query processing for centralized DBMS.

Pearson Education 2009

Pearson Education 2009

Example 23.1 - Different Strategies

Pearson Education 2009

Example 23.1 - Different Strategies

Pearson Education 2009

Example 23.1 - Different Strategies

Pearson Education 2009

Example 23.1 - Cost Comparison

Cost (in disk accesses) are:

Pearson Education 2009

Phases of Query Processing

has four main phases:

decomposition (consisting of parsing and validation); optimization; code generation; execution.

Pearson Education 2009

Phases of Query Processing

Dynamic versus Static Optimization

Pearson Education 2009

Dynamic versus Static Optimization

Pearson Education 2009

Pearson Education 2009

Pearson Education 2009

Pearson Education 2009

Pearson Education 2009

Example 23.1 - R.A.T.

Pearson Education 2009

Conjunctive normal form:

Disjunctive normal form:

Pearson Education 2009

Pearson Education 2009

Pearson Education 2009

Transformation Rules for RA Operations

referred to as cascade of Selection. branchNo='B003' salary>15000(Staff) = branchNo='B003'(salary>15000(Staff))

Pearson Education 2009

Transformation Rules for RA Operations

Pearson Education 2009

Transformation Rules for RA Operations

example: lNamebranchNo, lName(Staff) = lName (Staff)

Pearson Education 2009

Transformation Rules for RA Operations

Pearson Education 2009

Transformation Rules for RA Operations

Pearson Education 2009

Transformation Rules for RA Operations

where p {A1, A2, , An}

Transformation Rules for RA Operations

p q(R X S) = (p(R)) X (q(S))

Pearson Education 2009

Transformation Rules for RA Operations

position='Manager' city='London'(Staff Staff.branchNo=Branch.branchNo Branch) = (position='Manager'(Staff)) Staff.branchNo=Branch.branchNo (city='London' (Branch))

Pearson Education 2009

Transformation Rules for RA Operations

Pearson Education 2009

Transformation Rules for RA Operations

Pearson Education 2009

Transformation Rules for RA Operations

position,city,branchNo(Staff = (position, branchNo(Staff)) city, branchNo (Branch))