Sie sind auf Seite 1von 27

g:\prints\rac\adding a node.

txt
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Current scenario:
I have two nodes in the cluster presently.
Host names :
-node01.example.com
-node02.example.com
Node to be added :
-node03.example.com

step.1

Procedure:
������������
Prepare the machine for third node
������������
� Set kernel parameters
� Install required rpm�s
� Create users/groups
� configure oracleasm
root@node03#oracleasm configure -i
oracleasm exit
oracleasm init
oracleasm scandisks
oracleasm listdisks
all asm disks will be listed

step.2

� Configure ssh connectivity for grid user among all 3 nodes �


� On node3 as grid user
[grid@node03 .ssh]$ssh-keygen -t rsa

ssh-keygen -t dsa

cd /home/grid/.ssh
cat *.pub > node03

scp node03 node01:/home/grid/.ssh/


[grid@node03 .ssh] $ssh node01
� Enter password
[grid@node01 ~]$cd /home/grid/.ssh

cat node03 >> authorized_keys

scp authorized_keys node02:/home/grid/.ssh/


scp authorized_keys node03:/home/grid/.ssh/
� Test ssh connectivity on all 3 nodes as grid user �
� run following on all 3 nodes twice as grid user�
echo ssh node01 hostname >> a.sh
echo ssh node02 hostname >> a.sh
echo ssh node03 hostname >> a.sh
echo ssh node01-priv hostname >> a.sh
echo ssh node02-priv hostname >> a.sh
echo ssh node03-priv hostname >> a.sh

chmod +x a.sh
/a.sh

� Run cluster verify to check that node03 can be added as node �


grid node01# cluvfy stage -pre crsinst -n node03 -verbose
� if time synchronization problem, restart ntpd service on each node
� Error grid is not a member of dba group � ignore
grid@node01 ~]$. oraenv �+ASM1
[grid@node01 ~]$ cd /u01/app/11.2.0/grid/oui/bin
� Add node

[grid@node01 bin]$./addNode.sh -silent �CLUSTER_NEW_NODES={node03}�


�CLUSTER_NEW_VIRTUAL_HOSTNAMES={node03-vip}�
� Execute oraInstroot.sh and root.sh on node3 as root �
[root@node03]#/u01/app/oraInventory/oraInstroot.sh
/u01/app/11.2.0/grid/root.sh
� check from node01 that node has been added �
node01 grid > crsctl stat res -t
� Start any resources if they are not up already �
node01 grid > crsctl start resource <resource name>
After you have added a node, in case you want to clone database home on that node,
please click here.

g:\prints\rac\MEMORY_TARGET not supported on this system in rac environment.txt


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
ORA-00845: MEMORY_TARGET not supported on this system in rac environment

Error:

[root@rac01 ~]# crsctl start cluster -all

CRS-2672: Attempting to start 'ora.crf' on 'rac01'

CRS-2672: Attempting to start 'ora.crf' on 'rac02'

CRS-2672: Attempting to start 'ora.asm' on 'rac02'

CRS-2672: Attempting to start 'ora.asm' on 'rac01'

CRS-2676: Start of 'ora.crf' on 'rac02' succeeded

CRS-2676: Start of 'ora.crf' on 'rac01' succeeded

CRS-5017: The resource action "ora.asm start" encountered the following error:
ORA-00845: MEMORY_TARGET not supported on this system
. For details refer to "(:CLSN00107:)" in
"/u01/app/grid/diag/crs/rac02/crs/trace/ohasd_oraagent_grid.trc".
CRS-2674: Start of 'ora.asm' on 'rac02' failed
CRS-2679: Attempting to clean 'ora.asm' on 'rac02

Reason:
/dev/shm is also know as tmpfs i.e. temporary file system which keeps all the file
system in virtual memory to speed up several processes.

Solution:

To increase the size


# mount -o remount,size=3G /dev/shm
Verify the size
# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda3 7.6G 4.4G 2.9G 61% /
tmpfs 3G 1007M 2.1G 33% /dev/shm
/dev/sda1 194M 25M 160M 14% /boot

To make permanent changes to your file system update your fstab


# vi /etc/fstab
tmpfs /dev/shm tmpfs defaults,size=3G 0 0

Update the new fstab file


# mount -a

g:\prints\rac\Node Eviction in Oracle RAC.txt


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Node Eviction in Oracle RAC
+++++++++++++++++++++++++++:

Node Reboot is performed by CRS to maintain consistency in Cluster environment by


removing node which is facing some critical issue.
A critical problem could be a node not responding via a network heartbeat, a node
not responding via a disk heartbeat, a hung , or a hung ocssd.bin process etc
Whenever, Database Administrator face Node Reboot issue, First thing to look at
should be/var/log/message and OS Watcher logs of the Database Node which was
rebooted.
var/log/messages will give you an actual picture of reboot:Exact time of restart,
status of resource like swap and RAM etc.

==> High Load on Database Server :


One common scenario is due to high load RAM and SWAP space of DBnode got exhaust
and system stops working and finally reboot.
So, Every time you see a node eviction start investigation with /var/log/messages
and Analyze OS Watcher logs

How to avoid Node Reboot due to High Load ?

The simple and best way to avoid this is use Oracle Database Resource Manager
(DBRM). DBRM help to resolve this by allowing the database to have more control
over how hardware resources and their allocation.
DBA should setup Resource consumer group and Resource plan and should use them as
per requirements. In Exadata system Exadata DBA can use IORM to setup resource
allocation among multiple database instances.

==> Voting Disk not Reachable :


One of the another reason for Node Reboot is clusterware is not able to access a
minimum number of the voting files.When the node aborts for this reason, the node
alert log will show CRS1606 error.

Here are few general approach for DBA to follow.


1. Use command "crsctl query css votedisk" on a node where clusterware is up to get
a list of all the voting files.
2. Check that each node can access the devices underlying each voting file.
3. Check for permissions to each voting file/disk have not been changed.
4. Check OS, SAN, and storage logs for any errors from the time of the incident.
5. Apply fix for 13869978 if only one voting disk is in use. This is fixed in
11.2.0.3.4 patch set and above, and 11.2.0.4 and above

==> Missed Network Connection between Nodes :


In technical term this is called asMissed Network Heartbeat (NHB). Whenever there
is communication gap or no communication between nodes on private network
(interconnect) due to network outage or some other reason. A node abort itself to
avoid "split brain" situation. The
most common (but not exclusive) cause of missed NHB is network problems
communicating over the private interconnect.

Suggestion to troubleshoot Missed Network Heartbeat.


1. Check OS statistics from the evicted node from the time of the eviction. DBA can
use OS Watcher to look at OS Stats at time of issue, check oswnetstat and
oswprvtnet for network related issues.
2. Validate the interconnect network setup with the Help of Network administrator.
3. Check communication over the private network.
4. Check that the OS network settings are correct by running the RACcheck tool.

==> Database Or ASM Instance Hang :


Sometimes Database or ASM instance hang can cause Node reboot. In these case
Database instance is hang and is terminated afterwards, which cause either reboot
cluster or Node eviction. DBA should check alert log of Database and ASM instance
for any hang situation which might cause this issue.
In few of the cases, bugs could be the reason for node reboot, bug may be at
Database level, ASM level or at Real Application Cluster level. Here, after initial
investigation from Database Administrator side, DBA should open an SR with Oracle
Support.

To ensure cluster and data integrity unhealthy nodes should be forcefully evicted
from a cluster.
A node eviction will be initiated if:
- Cluster member cannot communicate via network heartbeat or Network disruption or
latency
==> Misscount setting is network latencies ( Time taken for data packet from
one point to other) in second from Node to Node ( Interconnect)
==> Check the setting using crsctl get css misscount
==> Default timeout 30 seconds for Linux/Unix
==> You can change the setting , Shutdown the CRS on all nodes, run crsctl
set misscount

- Slow interconnect or failures


- Corrupted network packets on the network may also cause CSS reboots on certain
platforms
- Delayed or Missed Disk Hearbeats by cluster member using the majority of voting
files
==> Check the setting in your cluster using crsctl get css disktimeout
==> disktimeout setting is Disk latencies in seconds from node to Votedisk.
==> Default value is 200 (DIsk IO)
- server hung, CPU starvation
- Known Oracle Clusterware bugs
- problems with core Oracle Clusterware processes (e.g.: OCSSD, css, cssdagent,
cssdmonitor).
- No space left for the device for the GI or /var filesystem
- Sudden death or hang of CSSD processes
- ORAAGENT/ORAROOTAGENT excessive resource (CPU, Memory, Swap) consumption
resulting in node eviction on specific OS platforms
g:\prints\rac\non asm to asm.txt
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Steps To Migrate a Database From Non-ASM to ASM
================================================:

Prerequisite - ASM instance should be created and up and running. Please refer my
previous article to create a ASM instance

step.1

check the status of database, datafiles and tempfiles

SQL> select name from v$database;


SQL> select file_name from dba_data_files;
SQL> select file_name from dba_temp_files;

step.2

check the spfile and controlfile location and change accordingly

SQL> show parameter spfile;


SQL> show parameter control_file;

SQL> alter system set control_files='+DATA' scope=spfile;


System altered.
SQL> alter system set db_create_file_dest='+DATA' scope=spfile;
System altered.

step.3

shutdown the database and open in mount stage


SQL> shutdown immediate
SQL> startup nomount;

step.4

connect rman and restore controlfile and mount the database


[oracle@dev ~]$ rman target /
RMAN> restore controlfile from '/u01/app/oracle/oradata/DEV/control01.ctl';
RMAN> alter database mount;

step.5 run rman convert command and switch database to copy and open the database

RMAN> backup as copy database format '+DATA' ;


RMAN> switch database to copy;
RMAN> alter database open;
database opened

step.6 check the datafile status

SQL> select file_name from dba_data_files;

step.6

drop the old tempfile and create new

step.7

add the new redolog groups and drop the old one
Now your database is migrated to ASM

g:\prints\rac\ntp in rac.txt
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Network Time Protocol (NTP)

Network Time Protocol (NTP) is a protocol for synchronizing the clocks of computers
in a network of computers. If you were to set up a RAC environment, one of the
requirement is to synchronize the clock time of all your RAC nodes to avoid
unnecessary node eviction. Time difference of more than 15 mins among nodes may
cause node evictions. Also the trace file analysis and GV$ view analysis may not be
accurate if the time is not synchronized among nodes.

Configuring NTP:

NTP configuration has 3 files in the /etc folder

ntp.conf
ntp.drift
ntp.trace

g:\prints\rac\physical and logical standbys.txt


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
The standby database is a transactionally consistent copy of the production
database. It is created initially from the backup of production database. Once
created, the data guard automatically synchronizes the standby.

There are two types of standby database, they are

1. Physical Standby
2. Logical Standby

Physical standby uses the recovery technique to sychronize the standby database
where as logical standby uses the sql apply method to synchronize both the
database.

Physical Standby

-Identical to the Primary database including the physical organization inthe disk
-DG uses Redo Apply technology, which applies redo data using standardrecovery
techniques.
-Can be used for Backups
-All data types are supported
-Can open in �Read only� but cannot apply logs. From 11g open in�read write�
-No additional objects can be created

Logical Standby

-Same logical information but physical organization and the structure ofdata are
different.
-DG uses SQL Apply, which first transforms the redo data into SQLstatements and
then executes the statement.
-Can be opened for reporting.
-Not all data types are supported for eg. LONG, NCLOB, LONG RAW, BFILE, XML types
are not supported.
-Can open in normal mode and simultaneously apply the logs.
-Additional indexes, materialized views can be created

Usually organizations use Logical Standby databases mainly for reporting purposes
and not for failover/switchover operations. For failover and switchover they use
physical standby database. The reason is maintaining logical standby is almost a
full time job, need extensive tuning of log apply services, and over hundreds of
patches, the logical is usually 3 to 5 hours behind the live database, thus making
it impossible for failover/switchover.

g:\prints\rac\rac 1 node advntges.txt


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
RAC 1 NODE ADVANTAGES
======================:

- Online migration of the sessions from active node to the other


- Easy conversion from RAC One Node to complete RAC and vice-versa
- Integrated with the features like Instance Caging to provide a better resource
management
- Supported over Exadata
- Supported over OVM(Oracle Virtualization Manager)
- Support for Rolling Patches of RAC to provide the same interface on RAC ONE Node
- Easy creation of One Node database using DBCA(from 11.2.0.2)
- Supported on all those platforms where Oracle RAC is supported

g:\prints\rac\RAC administration.txt
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Use SRVCTL to manage Oracle-supplied resources such as


Listener,
Instances,
Disk groups,
Networks.

Use CRSCTL for managing Oracle Clusterware and its resources.

Oracle strongly discourages directly manipulating Oracle-supplied resources


(resources whose names begin with ora) using CRSCTL. This could adversely impact
the cluster configuration.

If resource name begins with ora then use SRVCTL

How to check the status of CRS on specific node?


$crsctl check crs

Note: crsctl command will be present in GRID_INRA_HOME

How to check the status of services on all nodes:


$ srvctl status nodeapps

How to check the status of complete clusterware stack on all nodes:


$ crsctl status resource -t

How to check the status of clusterware servers:


$ crsctl status server -f

How to stop csr on all nodes of clusterware:


# ./crsctl stop cluster -all

How to stop crs on specific node:


# ./crsctl stop crs

How to start crs on specific node:


# ./crsctl start crs

How to check the status of specific Instance across cluster nodes:


$ srvctl status instance -d racdb -i racdb1

How to check the status of all instances across the cluster nodes:
$ srvctl status instance -d racdb -i racdb1,racdb2,racdb3,racdb4

How to start specific Instance across cluster nodes:


$ srvctl start instance -d racdb -i racdb4

How to check the status of database across cluster nodes:


$ srvctl status database -d racdb

How to stop cluster database:


$ srvctl stop database -d racdb

-- Start/ Stop database / instance services status:


From oracle user
1) Verify instance status

$ srvctl status instance -d EHISKOL -i EHISKOL2


Instance EHISKOL2 is running on node ekrac2

2) To Stop instance
$ srvctl stop instance -d EHISKOL -i EHISKOL2

3) To start instance
$ srvctl start instance -d EHISKOL -i EHISKOL2

-- If below message found, then instance services might be runnng.

PRCC-1015 : EHISKOL was already running on ekrac2


PRCR-1004 : Resource ora.ehiskol.db is already running

4) To stop all instance services across the nodes:


$ srvctl stop database -d EHISKOL

-- Start/ Stop SCAN listeners in RAC databases:

1) Verify SCAN listener status:

$ srvctl status scan_listener


SCAN Listener LISTENER_SCAN1 is enabled
SCAN listener LISTENER_SCAN1 is running on node ekrac2
SCAN Listener LISTENER_SCAN2 is enabled
SCAN listener LISTENER_SCAN2 is running on node ekrac1
SCAN Listener LISTENER_SCAN3 is enabled
SCAN listener LISTENER_SCAN3 is running on node ekrac1

$ srvctl status listener


Listener LISTENER is enabled
Listener LISTENER is running on node(s): ekrac2,ekrac1

Note: Assume 3 scan IP configured with DNS.

When single / default IP configured with SCAN listener, then status will be shown
like below:

$ srvctl status scan_listener


SCAN Listener LISTENER_SCAN1 is enabled
SCAN listener LISTENER_SCAN1 is running on node host1

2) To stop the scan listener:


$ srvctl stop scan_listener

3) To start the scan listener


$ srvctl start scan_listener

1) Verify Cluster services / CRS status


( for single node)

# ./crsctl check crs


CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online

for all nodes)

# crsctl check cluster -all


**************************************************************
ekrac1:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online

2) To stop Cluster services / CRS (single node)

./crsctl stop crs

START SEQUENCE

COMMAND DESCRIPTION
-----------------------------------------------------------------------------------
--------
/etc/init.d/init.crs start START CRS PROCESS (AS ROOT USER)
srvctl start asm -n NODE1 START ASM INSTANCE on node 1
srvctl start asm -n NODE2 START ASM INSTANCE on node 2
srvctl start database -d ORCL START DATABASE
srvctl start instance -d ORCL -i ORCL1 START first INSTANCE (skip it if
running 'start database' as that will start both instances)
srvctl start instance -d ORCL -i ORCL2 START second INSTANCE (skip it if
running 'start database', as taht will start both instances)
srvctl start nodeapps -n NODE1 START NODEAPPS on NODE1
srvctl start nodeapps -n NODE2 START NODEAPPS ON NODE2

STOP SEQUENCE

COMMAND DESCRIPTION
-----------------------------------------------------------------------------------
-------
srvctl stop database -d ORCL STOP DATABASE
srvctl stop instance -d ORCL -i ORCL1 STOP first INSTANCE (skip it if running
'stop database' as that will stop both instances)
srvctl stop instance -d ORCL -i ORCL2 STOP second INSTANCE (skip it if running
'stop database' as that will stop both instances)
srvctl stop asm -n NODE1 STOP ASM INSTANCES on NODE 1 (In 11G , we
have OCR on ASM so we cannot stop ASM, but if you have OCR in NON-ASM you should
stop it)
srvctl stop asm -n NODE2 STOP ASM INSTANCES on NODE 2 (In 11G , we
have OCR on ASM so we cannot stop ASM, but if you have OCR in NON-ASM you should
stop it)
srvctl stop nodeapps -n NODE1 STOP NODEAPPS on NODE 1
srvctl stop nodeapps -n NODE2 STOP NODEAPPS on NODE 2
/etc/init.d/init.crs stop STOP CRS PROCESSES (AS ROOT USER)

OTHER USEFUL COMMANDS

COMMAND DESCRIPTION
---------------------------------------------------------------------
crsctl status resource -t Clusterware Resource Status Check
srvctl status database -d ORCL STATUS OF DATABASE
srvctl stop listener -l LISTENER_NAME STOP A LISTENER
srvctl start listener -l LISTENER_NAME START A LISTENER
crsctl stop has stop all the clusterware services/
resources on specific node (including DB and listener) (run as root)
crsctl start has start all the clusterware services/
resources on specific node (including DB and listener) (run as root)
crsctl stop cluster -all to stop csr services on all nodes of
clusterware (run as root)
crsctl start cluster -all to start crs services on all nodes of
clusterware (run as root)
crsctl check has to check if ohasd is running/ stopped
(run as root)
crsctl enable has enable Oracle High Availability
Services autostart (run as root)
crsctl disable has disable Oracle High Availability
Services autostart (run as root)
crsctl config has check if Oracle High Availability
Services autostart is enabled/ disabled (run as root)
srvctl status nodeapps to check the status of services on all
nodes
crsctl stop crs stop all the clusterware services/
resources ON THAT NODE! (run as root)
crsctl start crs start all the clusterware services/
resources ON THAT NODE! (run as root)
cluvfy comp scan -verbose Verifying scan status scan_listener

srvctl config scan_listener Verifying scan port


srvctl relocate scan -i 1 -n NODE1 Relocate scan listener 1 to the
mentioned node
g:\prints\rac\rac logfiles.txt
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Clusterware Log Files

All clusterware log files are stored under $ORA_CRS_HOME/log/ directory.

1. alert<nodename>.log : Important clusterware alerts are stored in this log file.


It is stored in $ORA_CRS_HOME/log/<hostname>/alert<hostname$gt;.log.

2. crsd.log : CRS logs are stored in $ORA_CRS_HOME/log/<hostname>/crsd/ directory.


The crsd.log file is archived every 10MB as crsd.101, crsd.102 ...

3. cssd.log : CSS logs are stored in $ORA_CRS_HOME/log/<hostname>/cssd/ directory.


The cssd.log file is archived every 20MB as cssd.101, cssd.102....

4. evmd.log : EVM logs are stored in $ORA_CRS_HOME/log/<hostname>/evmd/ directory.

5. OCR logs : OCR logs (ocrdump, ocrconfig, ocrcheck) log files are stored in
$ORA_CRS_HOME/log/<hostname>/client/ directory.

6. SRVCTL logs: srvctl logs are stored in two locations,


$ORA_CRS_HOME/log/<hostname>/client/ and in $ORACLE_HOME/log/<hostname>/client/
directories.

7. RACG logs : The high availability trace files are stored in two locations
$ORA_CRS_HOME/log/<hostname>/racg/ and in $ORACLE_HOME/log/<hostname>/racg/
directories.

RACG contains log files for node applications such as VIP, ONS etc.
Each RACG executable has a sub directory assigned exclusively for that executable.

racgeut : $ORA_CRS_HOME/log/<hostname>/racg/racgeut/
racgevtf : $ORA_CRS_HOME/log/<hostname>/racg/racgevtf/
racgmain : $ORA_CRS_HOME/log/<hostname>/racg/racgmain/

racgeut : $ORACLE_HOME/log/<hostname>/racg/racgeut/
racgmain: $ORACLE_HOME/log/<hostname>/racg/racgmain/
racgmdb : $ORACLE_HOME/log/<hostname>/racg/racgmdb/
racgimon: $ORACLE_HOME/log/<hostname>/racg/racgimon

g:\prints\rac\rac to non rac cloning.txt


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
step.1

Take the full backup of the source database using Rman.

step.2

Once backup is done, transfer the backupsets to target server where we want to
restore/clone

step.3
On Target Standalone Server Ensure that the server(UAT) meets oracle software
requirements.

a. os must be certified
b. kernel parameters must be set
c. mandatory os packages should be installed
d. os limits should be set
e. create the group/user (dba/oracle)
5. set the environment variables (ORACLE_HOME, ORACLE_BASE, ORACLE_SID, PATH,
LD_LIBRARY_PATH etc)

step.4

Create password file on the standalone server as below.

step.5

Create pfile on source and copied to target and remove the RAC related and other
parameters

ex:

*.audit_file_dest='/oracle/app/oracle/admin/qaclone/adump'
*.cluster_database=FALSE
*.control_files='+DATA/qaclone/control01.ctl','+DATA/qaclone/control02.ctl'
*.db_file_name_convert='+PRODDATA/prod','+DATA/qaclone'
*.db_name='qaclone'
*.diagnostic_dest='/oracle/app/oracle/admin/qaclone'
*.log_file_name_convert='+PRODDATA/prod','+DATA/qaclone

step.6

Create required directories on Target.


e.g.
$ mkdir -p /u01/app/oracle/oradata/DB11G
$ mkdir -p /u01/app/oracle/fast_recovery_area/DB11G
$ mkdir -p /u01/app/oracle/admin/DB11G/adump

step.7

Copy all the RMAN backups to Target server

step.8

Connect to Target Instance in nomount using modified pfile.


$ ORACLE_SID=DB11G; export ORACLE_SID
$ sqlplus / as sysdba
STARTUP NOMOUNT;

step.9

check that you can access the source database, by copying tnsEntries from the
SOURCE Node to the Target node tnsnames.ora file.

step.10
On the target node, connect to the RMAN using the following command, where PROD is
your source database, and CLONE is the target database:

$ rman target sys/sys@PROD auxiliary /

RMAN>

duplicate target database to CLONE;

step.11

Disable and drop closed online redolog group

SQL> select THREAD#, STATUS, ENABLED from v$thread;

THREAD# STATUS ENABLED


---------- ------ --------
1 OPEN PUBLIC
2 CLOSED PUBLIC

SQL> select group# from V$log where THREAD#=2;

GROUP#
----------
3
4

SQL> alter database disable thread 2;

Database altered.

SQL> alter database drop logfile group 3;

Database altered.

SQL> alter database drop logfile group 4;

Database altered.

SQL> select THREAD#, STATUS, ENABLED from v$thread;

THREAD# STATUS ENABLED


---------- ------ --------
1 OPEN PUBLIC

step.12

Delete the UNDO tablespace related to Node 2.

- Now you can remove the undo tablespaces of other instances.

SQL> show parameter undo;


NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
undo_management string AUTO
undo_retention integer 900
undo_tablespace string UNDOTBS1
SQL> select tablespace_name from dba_tablespaces where contents='UNDO';
TABLESPACE_NAME
------------------------------
UNDOTBS1
UNDOTBS2
SQL> drop tablespace UNDOTBS2 including contents and datafiles;
Tablespace dropped.

optional:

change the following database initialization parameters, in init parameter file:

_no_recovery_through_resetlogs=TRUE
undo_management to =MANUAL
undo_tablespace=UNDOTBS1

_no_recovery_through_resetlogs=TRUE

ORA-38856: cannot mark instance UNNAMED_INSTANCE_2 (redo thread 2) as enabled

This seems to be Bug 4355382 and its expected while doing RAC Restore/Recovery.

##########################
# Solution
##########################

ADD _no_recovery_through_resetlogs Parameter and set it to TRUE.

I have added _no_recovery_through_resetlogs=TRUE parameter to our PFILE and brought


up the database to Mount Stage.

Now opened the database with RESETLOGS options and it worked. This parameter tells
oracle not to do any recovery while doing this resetlogs operation.After Opening
the database, the parameter can be removed from the pfile.

http://sandeepmagdum.blogspot.in/p/rac-to-nonrac-clonning.html
http://oracledba102.blogspot.in/2012/04/how-to-clone-oracle-home-rac-to-non-
rac.html

rconfig:
http://sateeshv-dbainfo.blogspot.in/2015/09/rconfig-frequently-asked-question.html

https://zakkiahmed.wordpress.com/2010/08/04/convert-11gr2-non-rac-database-to-rac-
database-using-rconfig/

g:\prints\rac\RAC useful commands.txt


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
RAC 11g useful commands

STOP SEQUENCE

COMMAND DESCRIPTION
-----------------------------------------------------------------------------------
-------
srvctl stop database -d ORCL STOP DATABASE
srvctl stop instance -d ORCL -i ORCL1 STOP first INSTANCE (skip it if running
'stop database' as that will stop both instances)
srvctl stop instance -d ORCL -i ORCL2 STOP second INSTANCE (skip it if running
'stop database' as that will stop both instances)
srvctl stop asm -n NODE1 STOP ASM INSTANCES on NODE 1 (In 11G , we
have OCR on ASM so we cannot stop ASM, but if you have OCR in NON-ASM you should
stop it)
srvctl stop asm -n NODE2 STOP ASM INSTANCES on NODE 2 (In 11G , we
have OCR on ASM so we cannot stop ASM, but if you have OCR in NON-ASM you should
stop it)
srvctl stop nodeapps -n NODE1 STOP NODEAPPS on NODE 1
srvctl stop nodeapps -n NODE2 STOP NODEAPPS on NODE 2
/etc/init.d/init.crs stop STOP CRS PROCESSES (AS ROOT USER)

START SEQUENCE

COMMAND DESCRIPTION
-----------------------------------------------------------------------------------
--------
/etc/init.d/init.crs start START CRS PROCESS (AS ROOT USER)
srvctl start asm -n NODE1 START ASM INSTANCE on node 1
srvctl start asm -n NODE2 START ASM INSTANCE on node 2
srvctl start database -d ORCL START DATABASE
srvctl start instance -d ORCL -i ORCL1 START first INSTANCE (skip it if
running 'start database' as that will start both instances)
srvctl start instance -d ORCL -i ORCL2 START second INSTANCE (skip it if
running 'start database', as taht will start both instances)
srvctl start nodeapps -n NODE1 START NODEAPPS on NODE1
srvctl start nodeapps -n NODE2 START NODEAPPS ON NODE2

OTHER USEFUL COMMANDS


COMMAND DESCRIPTION
---------------------------------------------------------------------
crsctl status resource -t Clusterware Resource Status Check
srvctl status database -d ORCL STATUS OF DATABASE
srvctl stop listener -l LISTENER_NAME STOP A LISTENER
srvctl start listener -l LISTENER_NAME START A LISTENER
crsctl stop has stop all the clusterware services/
resources on specific node (including DB and listener) (run as root)
crsctl start has start all the clusterware services/
resources on specific node (including DB and listener) (run as root)
crsctl stop cluster -all to stop csr services on all nodes of
clusterware (run as root)
crsctl start cluster -all to start crs services on all nodes of
clusterware (run as root)
crsctl check has to check if ohasd is running/ stopped
(run as root)
crsctl enable has enable Oracle High Availability
Services autostart (run as root)
crsctl disable has disable Oracle High Availability
Services autostart (run as root)
crsctl config has check if Oracle High Availability
Services autostart is enabled/ disabled (run as root)
srvctl status nodeapps to check the status of services on all
nodes
crsctl stop crs stop all the clusterware services/
resources ON THAT NODE! (run as root)
crsctl start crs start all the clusterware services/
resources ON THAT NODE! (run as root)
cluvfy comp scan -verbose Verifying scan status scan_listener

srvctl config scan_listener Verifying scan port


srvctl relocate scan -i 1 -n NODE1 Relocate scan listener 1 to the
mentioned node

g:\prints\rac\rac_installation.txt
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
rac installation:
++++++++++++++++

create os groups as root user:

groupadd -g 501 oinstall


groupadd -g 502 dba
groupadd -g 503 oper
groupadd -g 504 backupdba
groupadd -g 505 dgdba
groupadd -g 506 kmdba
groupadd -g 507 asmdba
groupadd -g 508 asmoper
groupadd -g 509 asmadmin

g:\prints\rac\RAC-machin-IPs.txt
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
CLUSTER= 1 ======>> RAC 1 => IP = 147.43.0.1
PRIV= 192.168.2.1
VIP = 147.43.0.101

RAC 2 => IP = 147.43.0.2


PRIV= 192.168.2.2
VIP = 147.43.0.102
SCAN IP=> 147.43.0.151

CLUSTER= 2 ======>> RAC 3 => IP = 147.43.0.3


PRIV= 192.168.2.3
VIP = 147.43.0.103

RAC 4 => IP = 147.43.0.4


PRIV= 192.168.2.4
VIP = 147.43.0.104
SCAN IP=> 147.43.0.152

CLUSTER= 3 ======>> RAC 5 => IP = 147.43.0.5


PRIV= 192.168.2.5
VIP = 147.43.0.105

RAC 6 => IP = 147.43.0.6


PRIV= 192.168.2.6
VIP = 147.43.0.106
SCAN IP=> 147.43.0.153

CLUSTER= 4 ======>> RAC 7 => IP = 147.43.0.7


PRIV= 192.168.2.7
VIP = 147.43.0.107

RAC 8 => IP = 147.43.0.8


PRIV= 192.168.2.8
VIP = 147.43.0.108
SCAN IP=> 147.43.0.154

CLUSTER= 5 ======>> RAC 9 => IP = 147.43.0.9


PRIV= 192.168.2.9
VIP = 147.43.0.109

RAC 10=> IP = 147.43.0.10


PRIV= 192.168.2.10
VIP = 147.43.0.110
SCAN IP=> 147.43.0.155

g:\prints\rac\rconfig.txt
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
RCONFIG : Frequently Asked Questions (Doc ID 387046.1)
PURPOSE
This FAQ explains the "rconfig" tool introduced in Oracle Database Version 10.2.
This is also used extensively when Oracle E-Business Suite 11i Customers convert
their Single Instance Database to RAC.

QUESTIONS AND ANSWERS

What is rconfig ?
rconfig is a command line tool introduced in Oracle Database 10g R2 to convert
Single-Instance 10g R2 Database to RAC(Real Application Cluster). The other option
is to use Convert to RAC option on the single-instance database target of Oracle
Enterprise Manager Grid Control.

How does rconfig works ?


rconfig is located at $ORACLE_HOME/bin/. rconfig takes in a xml input file and
convert the Single Instance database whose information is provided in the xml.This
tool is documented in the RAC Admin Guide and a sample xml can be found
$ORACLE_HOME/assistants/rconfig/sampleXMLs/ConvertToRAC.xml.
rconfig performs following steps :
Migrate the database to ASM storage (Only if ASM is specified as storage option in
the configuration xml file above)
Create Database Instances on all nodes in the cluster
Configure Listener and NetService entries
Configure and register CRS resources
Start the instances on all nodes in the cluster.

What are the prerequisites before we use rconfig ?


Before you convert a single-instance database to a RAC database using rconfig,
ensure that the following conditions are met for each cluster node that you intend
to make a RAC database node:
Oracle Clusterware 10g Release 2 (10.2) is installed, configured, and running.
Oracle Real Application Clusters 10g Release 2 (10.2) software is installed.
The Oracle binary is enabled for RAC.
Shared storage, either Oracle Cluster File System or ASM, is available and
accessible from all nodes.
User equivalence exists for the oracle user.

How to test rconfig before converting to RAC ?


Oracle 10g R2 Install has provided a sample rconfig input XML file called
ConvertToRAC.xml at directory $ORACLE_HOME/assistants/rconfig/sampleXMLs. Make a
copy of the sample XML file & make your own customised copy of input giving your
Instance specific details. The Sample XML file includes comments to make understand
how to edit each variable.
For testing purpose you need to modify the "Convert verify = "ONLY" option in the
xml file.Then run the tool :
$ ./rconfig convert.xml
The Convert verify option in the ConvertToRAC.xml file has three options:
Convert verify="YES": rconfig performs checks to ensure that the prerequisites for
single-instance to RAC conversion have been met before it starts conversion
Convert verify="NO": rconfig does not perform prerequisite checks, and starts
conversion
Convert verify="ONLY" : rconfig only performs prerequisite checks; it does not
start conversion after completing prerequisite checks

Where are the rconfig log files located ?


rconfig log files are located at $ORACLE_HOME/cfgtoollogs/rconfig
Also one important thing to note is that rconfig re-writes the log file everytime
you run the rconfig tool. So make sure have a copy in case you want to refer to the
results of earlier run

How to restart Rconfig ?

After running RCONFIG in Convert verify="YES" Mode, if you get any fatal error(e.g.
Disk space not available, Some issue with 10g Parameters, Shared Storage Issues)
which exits the rconfig tool & stops the conversion. You can restart the rconfig
tool again by running the command " $ ./rconfig convert.xml ". This will perform
the clean-up operation on converted Instance or will delete the files created by
earlier run. It again starts the process of conversion from start i.e. taking rman
backup of Single Instance. After the conversion you can check the filesystem to
verify if there is anything left from the previous failed run.

ConvertToRAC_AdminManaged.xml
ConvertToRAC_PolicyManaged.xml
==================================================================

Convert 11gR2 non-RAC database to RAC database using rconfig

Oracle provides 3 methods to convert non-RAC single instance database to RAC


databases:

1. DBCA
2. rconfig
3. Enterprise Manager

All the 3 have their own benefits and can be used to suit one�s needs. My recent
work involved the conversion of non-RAC single instance database to RAC database
using rconfig, although I�ve tested all the 3 methods but concluded on rconfig.

Pre-requisites:

1. Configure Shared Storage setup ASM, NFS (NAS) or clustered storage.

For ASM, refer to Oracle Database Storage Administrator�s Guide11g Release 2 (11.2)
For configuring shared storage, refer to Configuring Storage for Grid
Infrastructure for a Cluster and Oracle Real Application Clusters (Oracle RAC)
(Linux)
See also Oracle Database Administrator�s Guide 11g Release 2 (11.2)
2. A clustered Grid Infrastructure install with at least one Scan listener address.
See

Oracle Grid Infrastructure Installation Guide 11g Release 2 (11.2) Linux

3. rconfig imposes a restriction on the choice of listener. The listener must be


the default listener, and it must run from the Grid Infrastructure home.

1
srvctl add listener -p 1522
After conversion, you can reconfigure the listener as required.

4. Install Clustered Oracle Database Software as per documentation, this can be


done by choosing the right configuration option. Refer to :

http://download.oracle.com/docs/cd/E11882_01/install.112/e10813/racinstl.htm#BABJGB
HB

I�ve installed the new 11gR2 clustered ORACLE_HOME at

/u01/app/oracle/product/11.2.0/db_2

on both the nodes orarac01 and orarac02

Converting Single Instance Database using rconfig

1. As an �oracle� OS user navigate to

$ORACLE_HOME/assistants/rconfig/sampleXMLs

2. Open the sample file ConvertToRAC_AdminManaged.xml using a text editor such as


vi. This XML sample file contains comment lines that provide instructions on how to
edit the file to suit your site�s specific needs.

3. Ensure you edit the xml with convert verify="ONLY"


The following are the sample entries:

<!--Specify current OracleHome of non-rac database for SourceDBHome -->


/u01/app/oracle/product/11.2.0/db_1
<!--Specify OracleHome where the rac database should be configured. It can be same
as SourceDBHome -->
/u01/app/oracle/product/11.2.0/db_2
<!--Specify SID of non-rac database and credential. User with sysdba role is
required to perform conversion -->

...

<!--Specify the list of nodes that should have rac instances running for the Admin
Managed Cluster Database. LocalNode should be the first node in this nodelist. -->

...

<!--Specify Database Area Location to be configured for rac database.If this field
is left empty, current storage will be use d for rac database. For CFS, this field
will have directory path. -->
+DATA
4. Move the spfile to the shared location, in this case the Single Instance
Database was hosted on file system, in this process we will move the datafiles from
file system storage to ASM.

So create spfile in the shared disk location

1
SQL>create spfile='+DATA/TEST/spfiletest.ora' from pfile;
You can check if the file is created through �asmcmd�

5. Take a backup of existing $SOURCE_ORACLE_HOME/dbs/initTEST.ora, and create a new


$SOURCE_ORACLE_HOME/dbs/initTEST.ora with the following parameter:

1
spfile='+DATA/TEST/spfiletest.ora'
6. Restart the Database
7. Now lets test if �rconfig� is ready for conversion, navigate to $ORACLE_HOME/bin
and issue the following command

1
$./rconfig $ORACLE_HOME/assistants/rconfig/sampleXMLs/ConvertToRAC_AdminManaged.xml
The above command validates( as we�ve set convert=�ONLY�) if rconfig is ready for
conversion. If the output throws any error, diagnose and troubleshoot to fix the
issue. Refer to the following output for successful validation:

Operation Succeeded

There is no return value for this step

..
8. Now are we are ready for conversion, edit the xml file
�ConvertToRAC_AdminManaged.xml� and change:

from:

1
2
3
..
<n:Convert verify="ONLY">
..
to

1
2
3
..
<n:Convert verify="YES">
..
9. Perform the conversion

1
$./rconfig $ORACLE_HOME/assistants/rconfig/sampleXMLs/ConvertToRAC_AdminManaged.xml
The conversion will take some time to complete. The progress can be monitored from
the logs located at $ORACLE_BASE/cfgtoollogs/rconfig

10. Once the conversion is complete you�d get a similar message in step 7.

11. Perform sanity checks and tweak the listener to suit your needs.

That sums up the procedure to convert Single Instance Oracle Database to RAC
database. Please do share your thoughts and comments.

http://msutic.blogspot.in/2014/05/convert-12cr1-non-rac-database-to-rac.html
http://samiora.blogspot.in/2013/06/

g:\prints\rac\Removing Nodes Steps.txt


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

$> $ORACLE_HOME/bin/dbca -silent -deleteInstance -nodeList <node to remove>


-gdbName <db name> -instanceName <instance to remove> -sysDBAUserName sys
-sysDBAPassword <sys password>

$> $ORACLE_HOME/bin/srvctl disable listener -l <listener_name> -n


<name_of_node_to_delete>
$> $ORACLE_HOME/bin/srvctl stop listener -l <listener_name> -n
<name_of_node_to_delete>

$> $ORACLE_HOME/oui/bin/runInstaller -updateNodeList ORACLE_HOME=$ORACLE_HOME


�CLUSTER_NODES={<node to remove>}� -local

$> $ORACLE_HOME/deinstall/deinstall -local

$. $ORACLE_HOME/oui/bin/runInstaller -updateNodeList ORACLE_HOME=$ORACLE_HOME


�CLUSTER_NODES={<remaining nodes>}�

#> $GI_HOME/bin/olsnodes -s -t

8. Disable Clusterware applications and daemons on node to be removed. Use the �-


lastnode� option when running on the last node in the cluster to be removed (root
user):

#> $GI_HOME/crs/install/rootcrs.pl -deconfig -force [-lastnode]

9. From any node not being removed delete Clusterware from the node (root user):

#> $GI_HOME/bin/crsctl delete node -n <node to remove>

10. As grid user from node being removed:

$GI_HOME/oui/bin/runInstaller -updateNodeList ORACLE_HOME=$GI_HOME


"CLUSTER_NODES={<node to remove>}" CRS=TRUE -local

11. Deinstall Clusterware software from the node:

$> $GI_HOME/deinstall/deinstall -local

12. From any existing node to remain, update the Clusterware with existing nodes:

$> $GI_HOME/oui/bin/runInstaller -updateNodeList ORACLE_HOME=$GI_HOME


�CLUSTER_NODES={<remaining nodes>}� CRS=TRUE

13. Verify the node has been removed and the remaining nodes are valid:

$GI_HOME/bin/cluvfy stage -post nodedel -n <node to remove> -verbose

14. Remove OCM host/configuration from the MOS interface

g:\prints\rac\voting disk.txt
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Voting Disk

What does it contain, who updates it, how is it used, where is it stored and so on?
Voting disk manage information about node membership. Each voting disk must be
accessible by all nodes in the cluster.If any node is not passing heat-beat across
other note or voting disk, then that node will be evicted by Voting disk, hence the
voting disk a key component of clusterware and its failure can lead to
inoperability of the cluster.

In a RAC at any point in time the clusterware must know which nodes are member of
the cluster so that

- it can perform load balancing

- In case a node fails, it can perform failover of resources as defined in the


resource profiles

- If a node joins, it can start resources on it as defined in OCR/OLR

- If a node joins, it can assign VIP to it in case GNS is in use

- If a node fails, it can execute callouts if defined and so on

Hence, there must be a way by which clusterware can find out about the node
membership.

That is where voting disk comes into picture. It is the place where nodes mark
their attendance. Consider an analogy where a manager wants to find out which of
his subordinates are present. He can just check the attendance register and assign
them their tasks accordingly. Similarly, CSSD process on every node makes entries
in the voting disk to ascertain the membership of that node. The voting disk
records node membership information. If it ever fails, the entire clustered
environment for Oracle 11g RAC will be adversely affected and a possible outage may
result if the vote disks is/are lost. Also, in a cluster communication between
various nodes is of paramount importance. Nodes which can�t communicate with other
nodes should be evicted from the cluster. While marking their own presence, all
the nodes also register the information about their communicability with other
nodes in voting disk . This is called network heartbeat. CSSD process in each RAC
node maintains its heart beat in a block of size 1 OS block, in the hot block of
voting disk at a specific offset. The written block has a header area with the
node name.

The heartbeat counter increments every second on every write call. Thus heartbeat
of various nodes is recorded at different offsets in the voting disk. In addition
to maintaining its own disk block, CSSD processes also monitors the disk blocks
maintained by the CSSD processes running in other cluster nodes. Healthy nodes will
have continuous network and disk heartbeats exchanged between the nodes. Break in
heart beat indicates a possible error scenario.If the disk block is not updated in
a short timeout period, that node is considered unhealthy and may be rebooted to
protect the database information. In this case , a message to this effect is
written in the kill block of the node. Each node reads its kill block once per
second, if the kill block is overwritten node commits suicide.

During reconfig (join or leave) CSSD monitors all nodes and determines whether a
node has a disk heartbeat, including those with no network heartbeat. If no disk
heartbeat is detected then node is declared as dead.

What is stored in voting disk?


Voting disks contain static and dynamic data.

Static data : Info about nodes in the cluster

Dynamic data : Disk heartbeat logging

It maintains and consists of important details about the cluster nodes membership,
such as

- which node is part of the cluster,

- who (node) is joining the cluster, and

- who (node) is leaving the cluster.

Why is voting disk needed ?

The Voting Disk Files are used by Oracle Clusterware by way of a health check .

- by CSS to determine which nodes are currently members of the cluster

- in concert with other Cluster components such as CRS to shut down, fence, or
reboot either single or multiple nodes whenever network communication is lost
between any node within the cluster, in order to prevent the dreaded split-brain
condition in which two or more instances attempt to control the RAC database. It
thus protects the database information.

- It will be used by the CSS daemon to arbitrate with peers that it cannot see over
the private interconnect in the event of an outage, allowing it to salvage the
largest fully connected subcluster for further operation. It checks the voting
disk to determine if there is a failure on any other nodes in the cluster. During
this operation, NM will make an entry in the voting disk to inform its vote on
availability. Similar operations are performed by other instances in the cluster.
The three voting disks configured also provide a method to determine who in the
cluster should survive. For example, if eviction of one of the nodes is
necessitated by an unresponsive action, then the node that has two voting disks
will start evicting the other node. NM alternates its action between the heartbeat
and the voting disk to determine the availability of other nodes in the cluster.

The Voting disk is the key communication mechanism within the Oracle Clusterware
where all nodes in the cluster read and write heartbeat information. CSSD processes
(Cluster Services Synchronization Daemon) monitor the health of RAC nodes
employing two distinct heart beats: Network heart beat and Disk heart beat. Healthy
nodes will have continuous network and disk heartbeats exchanged between the
nodes. Break in heart beat indicates a possible error scenario.

There are few different scenarios possible with missing heart beats:

1. Network heart beat is successful, but disk heart beat is missed.

2. Disk heart beat is successful, but network heart beat is missed.

3. Both heart beats failed.

In addition, with numerous nodes, there are other possible scenarios too. Few
possible
scenarios:

1. Nodes have split in to N sets of nodes, communicating within the set, but not
with members in other set.

2. Just one node is unhealthy.

Nodes with quorum will maintain active membership of the cluster and other node(s)
will be fenced/rebooted.

A node must be able to access more than half of the voting disks.

A node must be able to access more than half of the voting disks at any time. For
example,let�s have a two node cluster with an even number of let�s say 2 voting
disks. Let Node1 is able to access voting disk1 and Node2 is able to access voting
disk2 . This means that there is no common file where clusterware can check the
heartbeat of both the nodes.

Hence, if we have 2 voting disks, all the nodes in the cluster should be able to
access both the voting disks. If we have 3 voting disks and both the nodes are able
to access more than half i.e. 2 voting disks, there will be at least on disk which
will be accessible by both the nodes. The clusterware can use that disk to check
the heartbeat of both the nodes. Hence, each node should be able to access more
than half the number of voting disks. A node not able to do so will have to be
evicted from the cluster to maintain the integrity of the cluster . After the
cause of the failure has been corrected and access to the voting disks has been
restored, you can instruct Oracle Clusterware to recover the failed node and
restore it to the cluster.

Loss of more than half your voting disks will cause the entire cluster to fail !!

Where is voting disk stored?

The Voting Disk is a shared disk that will be accessed by all member nodes in the
cluster during an operation. Hence, the voting disks must be on shared accessible
storage .

- You should plan on allocating 280MB for each voting disk file.

- Prior to 11g R2 RAC, it could be placed on a raw device

A clustered filesystem supported by Oracle RAC such as OCFS, Sun Cluster, or


Veritas

Cluster filesystem

- As of 11g R2 RAC, it can be placed on ASM disks . This simplifies management


and improves performance. But this brought up a puzzle too. For a node to join the
cluster, it must be able to access voting disk but voting disk is on ASM and ASM
can�t be up until node is up. To resolve this issue, Oracle ASM reserves several
blocks at a fixed location for every Oracle ASM disk used for storing the voting
disk.As a result , Oracle Clusterware can access the voting disks present in ASM
even if the ASM instance is down and CSS can continue to maintain the Oracle
cluster even if the ASM instance has failed.The physical location of the voting
files in used ASM disks is fixed, i.e. the cluster stack does not rely on a running
ASM instance to access the files. The location of the file is visible in the ASM
disk header (dumping the file out of ASM with dd is quite easy):
� The voting disk is not striped but put as a whole on ASM Disks

� In the event that the disk containing the voting disk fails, Oracle ASM will
choose another disk on which to store this data.

� It eliminates the need for using a third-party cluster volume manager.

� you can reduce the complexity of managing disk partitions for voting disks during
Oracle

Clusterware installations.

� Voting disk needs to be mirrored, should it become unavailable, cluster will


come down.

Hence, you should maintain multiple copies of the voting disks on separate disk
LUNs so that you eliminate a Single Point of Failure (SPOF) in your Oracle 11g RAC
configuration.

� If voting disk is stored on ASM, multiplexing level of voting disk is decided by


the - If voting disk is on a diskgroup with external redundancy, one copy of voting
file will be stored on one disk in the diskgroup.

- If we store voting disk on a diskgroup with normal redundancy, we should be able


to tolerate the loss of one disk i.e. even if we lose one disk, we should have
sufficient number of voting disks so that clusterware can continue. If the
diskgroup has 2 disks (minimum required for normal redundancy), we can store 2
copies of voting disk on it. If we lose one disk, only one copy of voting disk will
be left and clusterware won�t be able to continue, because to continue,
clusterware should be able to access more than half the no. of voting disks i.e.>
(2*1/2)

i.e. > 1

i.e.= 2

Hence, to be able to tolerate the loss of one disk, we should have 3 copies of the
voting disk on a diskgroup with normal redundancy . So, a normal redundancy
diskgroup having voting disk should have minimum 3 disks in it.

� Similarly, if we store voting disk on diskgroup with high redundancy, 5 Voting


Files are placed, each on one ASM Disk i.e a high redundancy diskgroup should have
at least 5 disks so that even of we lose 2 disks, clusterware can continue .

� Ensure that all the nodes participating in the cluster have read/write
permissions on disks.

� You can have up to a maximum of 15 voting disks. However, Oracle recommends that
you do not go beyond five voting disks.

Backing up voting disk:

In previous versions of Oracle Clusterware you needed to backup the voting disks
with the dd command. Starting with Oracle Clusterware 11g Release 2 you no longer
need to backup the voting disks. The voting disks are automatically backed up as a
part of the OCR. In fact,
Oracle explicitly indicates that you should not use a backup tool like dd to backup
or restore voting disks. Doing so can lead to the loss of the voting disk.

Although the Voting disk contents are not changed frequently, you will need to back
up the Voting disk file every time

- you add or remove a node from the cluster or

- immediately after you configure or upgrade a cluster.

A node in the cluster must be able to access more than half of the voting disks
at any time in order to be able to tolerate a failure of n voting disks. Therefore,
it is strongly recommended that you configure an odd number of voting disks such as
3, 5, and so on.

� Check the location of voting disk

grid@host01$crsctl query css votedisk

## STATE File Universal Id File Name Disk group

� �� ������ ��� ���

1. ONLINE 243ec3b2a3cf4fbbbfed6f20a1ef4319 (ORCL:ASMDISK01) [DATA]

Located 1 voting disk(s).

� we can see that only one copy of the voting disk is there on data diskgroup which
has external redundancy. As I mentioned earlier, Oracle writes the voting devices
to the underlying disks at pre-designated locations so that it can get the contents
of these files when the cluster starts up.

Das könnte Ihnen auch gefallen