Sie sind auf Seite 1von 14

Understanding Instance Recovery in RAC Understanding Cache Fusion in RAC

Crash Recovery - all instances have failed


Instance Recovery - one instance has failed

In both cases the threads from failed instances need to be merged, in a instance recovery SMON will perform the recovery where
as in a crash recovery a foreground process performs the recovery.
The main features (advantages) of cache fusion recovery are

Recovery cost is proportional to the number of failures, not the total number of nodes
It eliminates disk reads of blocks that are present in a surviving instance's cache

It prunes recovery set based on the global resource lock state

The cluster is available after an initial log scan, even before recovery reads are complete

In cache fusion the starting point for recovery of a block is its most current PI version, this could be located on any of the surviving
instances and multiple PI blocks of a particular buffer can exist.
Remastering is the term used that describes the operation whereby a node attempting recovery tries to own or master the
resource(s) that were once mastered by another instance prior to the failure. When one instance leaves the cluster, the GRD of that
instance needs to be redistributed to the surviving nodes. RAC uses an algorithm called lazy remastering to remaster only a
minimal number of resources during a reconfiguration. The entire Parallel Cache Management (PCM) lock space remains invalid
while the DLM and SMON complete the below steps
1. IDLM master node discards locks that are held by dead instances, the space is reclaimed by this operation is used to
remaster locks that are held by the surviving instance for which a dead instance was remastered
2. SMON issues a message saying that it has acquired the necessary buffer locks to perform recovery
1

Lets look at an example on what happens during a remastering, lets presume the following

Instance A masters resources 1, 3, 5 and 7


Instance B masters resources 2, 4, 6, and 8

Instance C masters resources 9, 10, 11 and 12

Instance B is removed from the cluster, only the resources from instance B are evenly remastered across the surviving nodes (no
resources on instances A and C are affected), this reduces the amount of work the RAC has to perform, likewise when a instance
joins a cluster only minimum amount of resources are remastered to the new instance.
Before Remastering

After Remastering

You can control the remastering process with a number of parameters


_gcs_fast_config

enables fast reconfiguration for gcs locks (true|false)

_lm_master_weight

controls which instance will hold or (re)master more resources than others

_gcs_resources

controls the number of resources an instance will master at a time

you can also force a dynamic remastering (DRM) of an object using oradebug
2

## Obtain the OBJECT_ID form the below table


SQL> select * from v$gcspfmaster_info;
force dynamic remastering
(DRM)

## Determine who masters it


SQL> oradebug setmypid
SQL> oradebug lkdebug -a <OBJECT_ID>
## Now remaster the resource
SQL> oradebug setmypid
SQL> oradebug lkdebug -m pkey <OBJECT_ID>

The steps of a GRD reconfiguration is as follows

Instance death is detected by the cluster manager


Request for PCM locks are frozen

Enqueues are reconfigured and made available

DLM recovery

GCS (PCM lock) is remastered

Pending writes and notifications are processed

I Pass recovery

The instance recovery (IR) lock is acquired by SMON

The recovery set is prepared and built, memory space is allocated in the SMON PGA

SMON acquires locks on buffers that need recovery

II Pass recovery

II pass recovery is initiated, database is partially available

Blocks are made available as they are recovered

The IR lock is released by SMON, recovery is then complete

The system is available

Graphically it looks like below

Cache Fusion in Operation


A quick recap of GCS, a GCS resource can be local or global, if it is local it can be acted upon without consulting other instances, if
it is global it cannot be acted upon without consulting or informing remote instances. GCS is used as a messaging agent to
coordinate manipulation of a global resource. By default all resources are in NULL mode (remember null mode is used to convert
from one type to another (share or exclusive)).
The table below denotes the different states of a resource
Mode/Role

Local

Global

Null (N)

NL

NG

Shared (S)

SL

SG

Exclusive (X)

XL

XG
States

SL

it can serve a copy of the block to other instances and it can read the
block from disk, since the block is not modified there is no need to
write to disk

XL

it has sole ownership and interest in that resource, it has exclusive


right to modify the block, all changes to the blocks are in the local
buffer cache and it can write the block to the disk. If another instance
wants the block it can to come via the GCS

NL

used to protect consistent read block, if an instance wants it in X


mode, the current instance will send the block to the requesting
6

instance and downgrades its role to NL


SG

a block is present in one or more instances, an instance can read the


read from disk and serve it to other instances

XG

a block can have one or more PIs, the instance with the XG role has
the latest copy of the block and is the most likely candidate to write
the block to the disk. GCS can ask the instance to write the block and
serve it to other instances

NG

after discarding PIs when instructed to by GCS, the block is kept in


the buffer cache with NG role, this serves only as the CR copy of the
block.

Below are a number of common scenarios to help understand the following

reading from disk


reading from cache

getting the block from cache for update

performing an update on a block

performing an update on the same block

reading a block that was globally dirty

performing a rollback on a previously updated block

reading the block after commit

We will assume the following

Four RAC environment (Instances A, B, C and D)


Instance D is the master of the lock resource for the data block BL
7

We will only use one block and it will reside at SCN 987654

We will use a three-letter code for the lock states


o

first letter will indicate the lock mode - N = Null, S = Shared and X = Exclusive

second latter will indicate lock role - G = Global, L = Local

The third letter will indicate the PIs - 0 = no PIs, 1 = a PI of the bloc

for example a code of SL0 means a global shared lock with no past images (PIs)
Reading a block from disk
instance C want to read the block it will request a lock in share
mode from the master instance
1. Instance C requests the block by sending a shared lock
request to master D
2. The block has never been read into the buffer cache of any
instance and it is not locked. Master D grants the lock to
instance C. The lock granted is SL0 (see above to work out
three-letter code)
3. Instance C reads the block from the shared disk into its
buffer cache
4. Instance C has the block in shard mode, the lock manager
updates the resource directory.
Reading a block from the cache

Carrying on from the above example, Instance B wants to read the


same block that is cached in instance C buffer.
1. Instance B sends a shared lock request to master instance D
2. The lock master knows that the block may be available at
instance C and sends a ping message to instance C
3. Instance C sends the block to instance B via the
interconnect, along with the block instance C indicates that
instance B should take the current lock mode and role from
instance C, instance C keeps a copy of the block
4. Instance B sends a message to instance D that it has
assumed the SL lock for the block. This message is not
critical for the lock manager, thus the message is sent
asynchronously
Getting a (Cached) clean block for update
Carrying on from the above example, instance A wants to modify
the same block that is already cached in instance B and C (block
987654)
1. Instance A sends an exclusive lock request to master D
2. The lock master knows that the block may be available at
instance B in SCUR mode and at instance C in CR mode. it
also sends a ping message to the shared lock holders. The
most recent access was at instance B and instance D sends
a BAST message to instance B
3. Instance B sends the block to instance A via the interconnect
and closes it shared lock. The block may still be in its buffer
to be as CR, but all locks are released
9

4. Instance A now has the exclusive lock on the block and


sends an assume message to instance D, the lock is in XL0
5. Instance A modifies the block in its buffer cache, the changes
are not committed and thus the block has not been written to
disk, thus the SCN remains at 987654
Getting a (Cached) modified block for update and commit
Carrying on from the above example, instance C now wants to
modify the block, if it tries to modify the same row it will have to wait
until instance A either commits or rolls back. However in this case
instance C wants to modify a different row in the same block.
1. Instance C sends an exclusive lock request to master D
2. The lock master knows that instance A holds an exclusive
lock on the block and hence sends a ping message to
instance A
3. Instance A sends the dirty buffer to instance C via the
interconnect, it downgrades the lock from XCR to NULL, it
keeps a PI version of the block and disowns any lock on that
buffer. Before shipping the block, Instance A has to create a
PI image and flush any pending redo for the block change,
the block mode on instance A is now NG1
4. Instance C sends a message to instance D indicating it has
the block in exclusive mode. The block role G indicates that
the block is in global mode and if it needs to write the block
to disk it must coordinate it with other instances that have
past images (PIs) of that block. Instance C modifies the block
and issues a commit, the SCN is now 987660.
Commit the previously modified block and select the data
10

Carrying on from the above example, instance A now issues a


commit to release the row level locks held by the transaction and
flush the redo information to the redologs
1. Instance A wants to commit the changes, commit operations
do not require any synchronous modifications to the block
2. The lock status remains the same as the previous state and
change vectors for the commits are written to the redologs.

Write the dirty buffers to disk due to a checkpoint


Carrying on from the above example, instance B writes the dirty
blocks from the buffer cache due to a checkpoint (this is were it gets
interesting and very clever)
1. Instance B sends a write request to master D with the
necessary SCN
2. The master knows that the most recent copy of the block
may be available at instance C and hence sends a message
to instance C asking to write
3. Instance C initiates a disk write and writes a BWR into the
redolog file
4. Instance C get the write notification that the write is complete
5. Instance C notifies the master that the write is completed
6. On receipt of the notification, instance D tells all PI holders to
discard their PIs, and the lock at instance C writes the
11

modified block to the disk


7. All instances that have previously modified this block will also
have to write a BWR. The write request by instance C has
now been satisfied and instance C can now proceed with its
checkpoint as usual
Master instance crashes
Carrying on from the above example
1. the master instance D crashes
2. The Global Resource Directory is frozen momentarily and

the resources held by master instance D will be equally


distributed in the surviving nodes, also know as remastering
(see remastering for more details).

Select the rows from Instance A

12

Carrying on from the above example, now instance A queries the


rows from that table to get the most recent data
1. Instance A sends a shared lock to now the new master
instance C
2. Master C knows the most recent copy of the block may be in
instance C and asks the holder to ship the CR block to
instance A
3. Instance C ships the CR block to instance A via the
interconnect

The above sequence of events can be seen in the table below


Example

Operation on Node
A

update the
block
update the same
block

commit the
changes
trigger checkpoint

SCUR

read the block from


cache

C
read block from
disk

Buffer Status

CR

SCUR

XCUR

CR

CR

PI

CR

XCUR

PI

CR

XCUR

CR

XCUR

13

instance
crash

7
8

select the rows

CR

XCUR

14

Das könnte Ihnen auch gefallen