Optimizer is an inference engine for determining the best possible database navigation strategy for any given SQL request. Relational optimization is very powerful because it allows queries to adapt to a changing database environment. It can also react to changes by formulating new access paths without requiring application coding changes to be implemented.
Physical Data Independence is the separation of access criteria from physical storage characteristics To optimize SQL The relational Optimizer must analyze each SQL statements by parsing it to determine the tables and columns that must be accessed.
The optimizer will access statistics stored by the RDBMS in either system catalog or the database objects themselves Every RDBMS has an embedded relational optimizer that renders SQL statements into executable access paths. Modern relational optimizers are cost based, meaning that the optimizer will attempt to formulate an access path for each query that reduces overall cost. CPU and I/O Costs The optimizer can arrive at a rough estimate of the CPU time required to run the query using each optimized access path at analyzes. Database Statistics A relational optimizer is of little use without accurate statistics about the data stored in the database. It provides DBMS a utility program or command to gather statistics about database objects and to store them for them for use by the optimizer. The DBA should collect modified statistics whenever a significant volume of date has been added or modified. Failure to do so will result in the optimizer basing its cost estimates on inaccurate statistics. This may be detrimental to query performance. DBMS collects statistical information Number of unique values stored in the column
Most frequently occurring values for columns
Index key density
Details on the ratio of clustering for clustered tables
Correlation pf columns to other columns
Structural state of the index or tablespace
Amount of the storage used by the database object Query Analysis It scans the SQL statement to determine its overall complexity. The formulation of the SQL statement is a significant factor in determining the access paths chosen by the optimizer.
The complexity of the query, the number and the type of predicates, the presence of functions, and the presence of ordering clauses enter into the estimated cost that is calculated by the optimizer Which tables in which database are required
Whether any views are required to be broken down into underlying tables
Whether tables joins or subselects are required
Which indexes, if any, can be used
How many predicates must be satisfied
Which functions must be executed
Whether the SQL uses OR or AND
How the DBMS process each component of the SQL statement
How much memory has been assigned to the data caches used by the tables in the SQL statement
How much memory is available for sorting if the query requires a sort. Density Density is the average percent of duplicate values stored in the index key column and is recorded as a percentage.
Joins Joining combining information from multiple tables.
When multiple tables are accessed, the optimizer figures out how to combine the tables in the most efficient manner.
When determining the access path for a join, the optimizer must determine the order in which the tables will be joined.
Choose the table to process first
Series of operations are performed on the outer table to prepare it for joining.
Rows from that table are then combined with rows from the second table, called the INNER TABLE. Two common Join Method
Nested-loop join
Merge-scan join Nested-loop Join Works by comparing qualifying rows of the outer table to the inner tables. A qualifying row is identified in the outer table, and then the inner table is scanned for a match. Merge-scan Join The tables to be joined are ordered by the keys. This ordering can be accomplished by a sort or by access via an index. Join Order The optimizer reviews each join in a query and analyzes that appropriate statistics to determine the optimal order in which the tables should be accessed to complete the join. To find optimal join access path, the optimizer uses built-in algorithms containing knowledge about joins and data volume. It matches this intelligence against the join predicates, databases statistics, and available indexes to estimate which order is more efficient.