Sie sind auf Seite 1von 10

Top 10 strategies for Oracle performance (part 4)

This is the final instalment of a four part series covering my “top 10” performance strategies for
Oracle databases. In part one, we looked at methodology, database and application design, and
indexing . In part two, we covered the essential tuning tools, the SQL optimizer and strategies for
tuning SQL and PL/SQL. In the third instalment we looked at contention, memory management and
IO optimization.

In this final instalment we’ll consider the performance optimization of Oracle Real Application
Clusters (RAC). RAC is central to Oracle’s grid architecture, and is a significant and important
technological advantage for Oracle. I always advise Oracle DBAs to get familiar with RAC and know
how to get the most of it.

RAC performance optimization is a big topic and I can only provide an introduction in this article.
You can find more information on each in my book Oracle Performance Survival Guide.

RAC architecture
RAC is a shared disk clustered databases: every instance in the cluster has equal access to the
database’s data on disk. This is in contrast to the shared nothing architecture employed by other
RDBMS clusters. In a shared nothing architecture, each instance is responsible for a certain subset
of data. Whenever a session needs that data, then the appropriate instance must be involved in
serving up the data.

The main challenge in the shared disk architecture is to establish a global memory cache across all
the instances in the cluster: otherwise the clustered database becomes IO bound. Oracle
establishes this shared cache via a high-speed private network referred to as the cluster
interconnect.

All the instances in a RAC cluster share access to datafiles on shared disk, though each have private
redo logs and undo segments. Each instance has its own SGA and background processes and each
session that connects to the cluster database connects to a specific instance in the cluster.

Page 1 © Quest Software, 2010


Figure 1 High level RAC architecture

RAC will perform well, and scale well, if the following are true:

• The time taken to request a block across the interconnect (Global Cache requests) is much
lower – say ten times less – than the time to retrieve a block from the disk. Global Cache
requests are intended to avoid the necessity of a disk read, and sometimes the disk read
must occur even after the Global Cache request.
• The cluster is well balanced, or at least there are no overloaded instances in the cluster.
Since so many RAC operations involve two or three instances, an overloaded instance might
cause problems for its neighbours as well as itself.
• The overhead incurred through cluster activities is a small proportion of the total database
time. We want our RAC database to be a database first, and a cluster second.

Measuring Cluster overhead


We can see the overall contribution of cluster related waits in comparison to other high level time
categories in the following query:

Page 2 © Quest Software, 2010


SQL> SELECT wait_class time_cat ,ROUND ( (time_secs), 2) time_secs,
2 ROUND ( (time_secs) * 100 / SUM (time_secs) OVER (), 2) pct
3 FROM (SELECT wait_class wait_class,
4 sum(time_waited_micro) / 1000000 time_secs
5 FROM gv$system_event
6 WHERE wait_class <> 'Idle'
7 AND time_waited > 0
8 GROUP BY wait_class
9 UNION
10 SELECT 'CPU',
11 ROUND ((SUM(VALUE) / 1000000), 2) time_secs
12 FROM gv$sys_time_model
13 WHERE stat_name IN ('background cpu time', 'DB CPU'))
14 ORDER BY time_secs DESC;

Time category TIME_SECS PCT


-------------------- ---------- ------
CPU 21554.33 43.45
Cluster 7838.82 15.80
Other 6322.23 12.75
Application 5077.09 10.24
System I/O 3387.06 6.83
User I/O 3302.49 6.66
Commit 1557 3.14
Concurrency 371.5 .75
Network 142.06 .29
Configuration 49.59 .10

As a rule of thumb, we might expect that cluster-related waits comprise less than 10% of total
database time. Waits above 20% certainly warrant investigation.

Reducing Global Cache latency


The RAC architecture requires and expects that instances will fetch data blocks across the
interconnect as an alternative to reading those blocks from disk. The performance of RAC is
therefore going to be very sensitive to the time it takes to retrieve a block from the Global Cache,
which we will call Global Cache latency.

Some documents or presentations suggest that Global Cache latency is primarily or exclusively
Interconnect latency: the time it takes to send the block across the interconnect network.
Interconnect latency is certainly an important part of overall Global Cache latency: but it’s not the
only part. Oracle processes such as the Global Cache Service (LMS) have to perform a significant
amount of CPU intensive processing each time a block is transferred, and this CPU time is usually as
least as significant as any other factor in overall Global Cache latency. In certain circumstances non-
CPU operations – such flushing redo entries to disk – will also contribute to Global Cache latency.

To measure Global Cache latency, we use the wait interface as exposed by GV$SYSTEM_EVENT. The
following query reports on average times for each of the Global Cache request types as well as
single-block read time (for comparison):

Page 3 © Quest Software, 2010


SQL> SELECT event, SUM(total_waits) total_waits,
2 ROUND(SUM(time_waited_micro) / 1000000, 2)
3 time_waited_secs,
4 ROUND(SUM(time_waited_micro)/1000 /
5 SUM(total_waits), 2) avg_ms
6 FROM gv$system_event
7 WHERE wait_class <> 'Idle'
8 AND( event LIKE 'gc%block%way'
9 OR event LIKE 'gc%multi%'
10 or event like 'gc%grant%'
11 OR event = 'db file sequential read')
12 GROUP BY event
13 HAVING SUM(total_waits) > 0
14 ORDER BY event;

Total Time Avg Wait


Wait event Waits (secs) (ms)
------------------------------ ------------ ------------ ---------
db file sequential read 283,192 1,978 6.99
gc cr block 2-way 356,193 396 1.11
gc cr block 3-way 162,158 214 1.32
gc cr grant 2-way 141,016 25 .18
gc cr multi block request 503,265 242 .48
gc current block 2-way 325,065 227 .70
gc current block 3-way 117,913 93 .79
gc current grant 2-way 45,580 20 .44
gc current grant busy 168,459 296 1.76
gc current multi block request 91,690 42 .46

This example output provides reason for concern. The average wait for Global Cache consistent read
requests (as shown by ‘gc cr block 2-way’ and ‘gc cr block 3-way’) is more than 1 millisecond and
more than 1/10th of the time for a db file sequential read. While the Global Cache is still faster than
disk, it’s taking longer than we’d expect if the interconnect and RAC were fully optimized.

Tuning the interconnect


When Global Cache waits are high, we should first determine if the latency is primarily the result of
interconnect network waits.

The best way to determine the interconnect contribution to overall performance is to use the ping
utility to measure latency independently of the Oracle stack. Ping packet handling is not identical to
RAC packet handling, but if ping latency is high then you can confidently assume that network
responsiveness is an issue.

In Oracle 10g the view X$KSXPIA shows the private and public IP addresses being used by the current
instance. In Oracle 11g this information is available in the view GV$CLUSTER_INTERCONNECTS. The
following query shows us the private interconnect IP address plus other identifying information for
the current instance (this query must be run as SYS):

Page 4 © Quest Software, 2010


SQL> SELECT instance_number, host_name, instance_name,
2 name_ksxpia network_interface, ip_ksxpia private_ip
3 FROM x$ksxpia
4 CROSS JOIN
5 v$instance
6 WHERE pub_ksxpia = 'N';

Inst Host Net Private


# Name INSTANCE_NAME IFace IP
---- ------------------------- ---------------- ----- ------------
3 melclul32.melquest.dev.me MELRAC3 eth1 192.168.0.12
l.au.qsft

We can then ping the IP address from another node in the cluster to determine average latency. On
a Linux system, we can use the “–s 8192” flag to set an 8K packet size so as to align with the block
size of this Oracle database. On Windows the appropriate flag is “-l”:

$ ping -c 5 -s 8192 192.168.0.12


PING 192.168.0.12 (192.168.0.12) 8192(8220) bytes of data.
8200 bytes from 192.168.0.12: icmp_seq=0 ttl=64 time=0.251 ms
8200 bytes from 192.168.0.12: icmp_seq=1 ttl=64 time=0.263 ms
8200 bytes from 192.168.0.12: icmp_seq=2 ttl=64 time=0.260 ms
8200 bytes from 192.168.0.12: icmp_seq=3 ttl=64 time=0.265 ms
8200 bytes from 192.168.0.12: icmp_seq=4 ttl=64 time=0.260 ms

--- 192.168.0.12 ping statistics ---


5 packets transmitted, 5 received, 0% packet loss, time 3999ms
rtt min/avg/max/mdev = 0.251/0.259/0.265/0.020 ms, pipe 2

The ping output above indicates very low latency – about .25 ms - across the interconnect.

Asides from high latencies – as exposed by the ping command - interconnect issues can show up as
“lost” or congested blocks.

There’s a few things we can do at the network level to optimize the interconnect:

• Use NIC bonding to aggregate the bandwidth of multiple network cards.


• Use a faster protocol – maybe 10 Gigabit Ethernet or Infiniband
• Enable Ethernet “Jumbo” frames
• Increase the UDP packet size

High Global Cache latencies can also occur if the remote instance is very busy: often balancing the
cluster (as outlined in the next section) is the solution.

Balancing the cluster


Achieving balance in a RAC configuration is important for scalability, manageability and
performance. In an unbalanced cluster the following undesirable situations can arise:

• Sessions on busy instances get poor service time.


• Sessions on idle instances wait for blocks from busy instances.
• Benefits of adding new instances may not be realized
• Tuning is harder because each instance has different symptoms.

Page 5 © Quest Software, 2010


We can assess cluster balance fairly easily: the following query reports on CPU, DB time and logical
reads on each instance within the cluster since startup:

SQL> WITH sys_time AS (


2 SELECT inst_id, SUM(CASE stat_name WHEN 'DB time'
3 THEN VALUE END) db_time,
4 SUM(CASE WHEN stat_name IN ('DB CPU', 'background cpu time')
5 THEN VALUE END) cpu_time
6 FROM gv$sys_time_model
7 GROUP BY inst_id )
8 SELECT instance_name,
9 ROUND(db_time/1000000,2) db_time_secs,
10 ROUND(db_time*100/SUM(db_time) over(),2) db_time_pct,
11 ROUND(cpu_time/1000000,2) cpu_time_secs,
12 ROUND(cpu_time*100/SUM(cpu_time) over(),2) cpu_time_pct
13 FROM sys_time
14 JOIN gv$instance USING (inst_id);

Instance DB Time Pct of CPU Time Pct of


Name (secs) DB Time (secs) CPU Time
-------- ------------- ------- ------------- --------
MELRAC3 3,705.30 24.48 1,119.99 17.03
MELRAC2 6,278.23 41.48 4,010.85 61.00
MELRAC1 5,150.96 34.03 1,444.06 21.96

In this example it is clear that MELRAC2 is being subjected to a disproportionate level of CPU load: if
this is not addressed, increasing cluster workload will almost certainly lead to performance
degradation as MELRAC2 becomes the bottleneck for the entire cluster.

Quest Software’s Spotlight on RAC – now available in the Toad DBA Suite RAC edition - probably has
the most advanced RAC balance monitoring. Spotlight on RAC displays cluster balance from a
number of perspectives and performs a statistical analysis to determine if the imbalance is
systematic or due to short term random fluctuations.

Page 6 © Quest Software, 2010


Figure 2 Spotlight on Oracle RAC cluster balance

There’s a number of techniques that you can try to balance your cluster database. In particular
Services allow you to allocate workloads to specific instances within a cluster, and can be either the
cause and/or cure of cluster imbalances.

Services serve two main purposes in RAC:

• By partitioning certain types of workload to certain instances, we can reduce the amount of
Global Cache traffic, since similar workloads are most likely to utilize similar data blocks.
• Services can help you share a RAC cluster across multiple applications, some of which may
have different service level objectives. By allocating more instances in the cluster to a
specific service, we effectively allocate the service a bigger share of cluster resources.
Oracle also provides client side, server side and application level load balancing facilities that you can
tweak to get better balance of workload across the cluster.

Minimizing Global Cache traffic


As we saw earlier, Global Cache requests are integral to RAC and represent both the “cost” of the
RAC architecture and the basis of its scalability. Avoiding a disk read by fetching a needed block
from another instance prevents RAC databases from becoming IO bound. However, each Global
Cache request adds overhead: it’s far better to find the data you want in the local buffer cache than
to retrieve it from another instance.

Page 7 © Quest Software, 2010


To determine how often the database needs to make Global Cache requests, we can compare the
number of blocks fetched across the interconnect with the total number of block accessed (e.g., the
number of logical reads). The following query performs that calculation as well as determining the
ratio of physical to logical reads (yes, the notorious Buffer Cache Hit Ratio):

SQL> WITH sysstats AS (


2 SELECT inst_id,
3 SUM(CASE WHEN name LIKE 'gc%received'
4 THEN VALUE END) gc_blocks_recieved,
5 SUM(CASE WHEN name = 'session logical reads'
6 THEN VALUE END) logical_reads,
7 SUM(CASE WHEN name = 'physical reads'
8 THEN VALUE END) physical_reads
9 FROM gv$sysstat
10 GROUP BY inst_id)
11 SELECT instance_name, logical_reads, gc_blocks_recieved, physical_reads,
12 ROUND(physical_reads*100/logical_reads,2) phys_to_logical_pct,
13 ROUND(gc_blocks_recieved*100/logical_reads,2) gc_to_logical_pct
14 FROM sysstats JOIN gv$instance
15 USING (inst_id);

Instance Logical GC Blocks Physical Phys/Logical GC/Logical


name Reads Received Reads Pct Pct
---------- ------------ ------------ ------------ ------------ ----------
MELRAC3 15,353,311 1,730,818 23,099 .15 11.27
MELRAC2 148,903,331 1,756,882 438,531 .29 1.18
MELRAC1 21,792,614 1,730,366 39,471 .18 7.94

Note how in the above example it’s the least busy instances (in terms of logical reads) that have the
highest Global Cache/Logical request ratio: the less busy an instance is, the more likely that the
blocks it needs are in the memory of another, more busy, instance. If every instance in the cluster is
fighting over the same set of “hot blocks” then we will see very high Global Cache traffic

We can attempt to reduce the amount of inter-instance traffic through one of the following
techniques:

• Isolating workloads to a particular instance or groups of instances. We can do this through


services configuration as discussed earlier.
• Isolating sessions that are likely to work on the same data. This is similar to isolating
workloads, but instead of isolating specific transaction types, we isolate sessions that are
likely to work on the same sets of data.
• Partitioning the segments with the highest levels of Global Cache activity. Hash partitioning
can split up the hot blocks, hopefully reducing Global Cache contention for those blocks.
Range or list partitioning the segments in conjunction with isolation of user populations can
also be considered.
• Reverse key indexes can help relieve hot Global Cache index leaf and branch blocks

Summary
The most significant difference between a RAC and a single instance database is the use of Global
Cache requests to fetch blocks from other instances in the cluster rather than to read them from
disk. RAC will scale and perform well, providing that:

Page 8 © Quest Software, 2010


• Global Cache latency is much less than disk read latency. Achieving this involves both
optimizing the interconnect network, and making sure that no instances get too busy to
respond to Global Cache requests in a timely manner.
• The rate of Global Cache requests is reasonable. In particular, “hot” blocks that are in
constant contention across the cluster should be eliminated.
• The cluster is reasonably well balanced. In particular, no instance should be overloaded: an
overloaded instance is likely to cause performance problems both for itself, and other
instances in the cluster.
I hope you’ve found this article – and the top 10 tuning strategies series as a whole – useful. I’ve
generally only been able to scratch the surface of the myriad of tuning issues and opportunities
presented by Oracle: more detail can be found in my book Oracle Performance Survival Guide or in
the Oracle documentation.
Quest products – such as those found in the Toad DBA suite – are designed to help you maximize
your efficiency and effectiveness when working through Oracle performance issues. If you like the
approach we’ve taken in these articles, then you’ll probably like what we’ve provided for you in the
Toad suites. In particular, Spotlight on Oracle RAC is now available within the Toad DBA suite for
Oracle RAC edition, and implements all of the RAC tuning ideas outlined in this article.

Figure 3 Spotlight on Oracle RAC

Page 9 © Quest Software, 2010


Guy Harrison is a Director of Research and Development at Quest Software, is an Oracle ACE and has
over 20 years experience in application and database
administration, performance tuning and software
development. Guy is the author of Oracle Performance
Survival Guide (Prentice Hall, 2009) and MySQL Stored
Procedure Programming (O’Reilly with Steven Feuerstein)
as well as other books, articles and presentations on
database technology. Guy is the architect of Quest's
Spotlight® family of diagnostic products and has
contributed to the development of other Quest products,
such as Toad®. Guy can be found on the Internet
at www.guyharrison.net, on email
at guy.harrison@quest.com and is @guyharrison on
twitter.

Page 10 © Quest Software, 2010

Das könnte Ihnen auch gefallen