Beruflich Dokumente
Kultur Dokumente
txt
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Current scenario:
I have two nodes in the cluster presently.
Host names :
-node01.example.com
-node02.example.com
Node to be added :
-node03.example.com
step.1
Procedure:
������������
Prepare the machine for third node
������������
� Set kernel parameters
� Install required rpm�s
� Create users/groups
� configure oracleasm
root@node03#oracleasm configure -i
oracleasm exit
oracleasm init
oracleasm scandisks
oracleasm listdisks
all asm disks will be listed
step.2
ssh-keygen -t dsa
cd /home/grid/.ssh
cat *.pub > node03
chmod +x a.sh
/a.sh
Error:
CRS-5017: The resource action "ora.asm start" encountered the following error:
ORA-00845: MEMORY_TARGET not supported on this system
. For details refer to "(:CLSN00107:)" in
"/u01/app/grid/diag/crs/rac02/crs/trace/ohasd_oraagent_grid.trc".
CRS-2674: Start of 'ora.asm' on 'rac02' failed
CRS-2679: Attempting to clean 'ora.asm' on 'rac02
Reason:
/dev/shm is also know as tmpfs i.e. temporary file system which keeps all the file
system in virtual memory to speed up several processes.
Solution:
The simple and best way to avoid this is use Oracle Database Resource Manager
(DBRM). DBRM help to resolve this by allowing the database to have more control
over how hardware resources and their allocation.
DBA should setup Resource consumer group and Resource plan and should use them as
per requirements. In Exadata system Exadata DBA can use IORM to setup resource
allocation among multiple database instances.
To ensure cluster and data integrity unhealthy nodes should be forcefully evicted
from a cluster.
A node eviction will be initiated if:
- Cluster member cannot communicate via network heartbeat or Network disruption or
latency
==> Misscount setting is network latencies ( Time taken for data packet from
one point to other) in second from Node to Node ( Interconnect)
==> Check the setting using crsctl get css misscount
==> Default timeout 30 seconds for Linux/Unix
==> You can change the setting , Shutdown the CRS on all nodes, run crsctl
set misscount
Prerequisite - ASM instance should be created and up and running. Please refer my
previous article to create a ASM instance
step.1
step.2
step.3
step.4
step.5 run rman convert command and switch database to copy and open the database
step.6
step.7
add the new redolog groups and drop the old one
Now your database is migrated to ASM
g:\prints\rac\ntp in rac.txt
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Network Time Protocol (NTP)
Network Time Protocol (NTP) is a protocol for synchronizing the clocks of computers
in a network of computers. If you were to set up a RAC environment, one of the
requirement is to synchronize the clock time of all your RAC nodes to avoid
unnecessary node eviction. Time difference of more than 15 mins among nodes may
cause node evictions. Also the trace file analysis and GV$ view analysis may not be
accurate if the time is not synchronized among nodes.
Configuring NTP:
ntp.conf
ntp.drift
ntp.trace
1. Physical Standby
2. Logical Standby
Physical standby uses the recovery technique to sychronize the standby database
where as logical standby uses the sql apply method to synchronize both the
database.
Physical Standby
-Identical to the Primary database including the physical organization inthe disk
-DG uses Redo Apply technology, which applies redo data using standardrecovery
techniques.
-Can be used for Backups
-All data types are supported
-Can open in �Read only� but cannot apply logs. From 11g open in�read write�
-No additional objects can be created
Logical Standby
-Same logical information but physical organization and the structure ofdata are
different.
-DG uses SQL Apply, which first transforms the redo data into SQLstatements and
then executes the statement.
-Can be opened for reporting.
-Not all data types are supported for eg. LONG, NCLOB, LONG RAW, BFILE, XML types
are not supported.
-Can open in normal mode and simultaneously apply the logs.
-Additional indexes, materialized views can be created
Usually organizations use Logical Standby databases mainly for reporting purposes
and not for failover/switchover operations. For failover and switchover they use
physical standby database. The reason is maintaining logical standby is almost a
full time job, need extensive tuning of log apply services, and over hundreds of
patches, the logical is usually 3 to 5 hours behind the live database, thus making
it impossible for failover/switchover.
g:\prints\rac\RAC administration.txt
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
How to check the status of all instances across the cluster nodes:
$ srvctl status instance -d racdb -i racdb1,racdb2,racdb3,racdb4
2) To Stop instance
$ srvctl stop instance -d EHISKOL -i EHISKOL2
3) To start instance
$ srvctl start instance -d EHISKOL -i EHISKOL2
When single / default IP configured with SCAN listener, then status will be shown
like below:
START SEQUENCE
COMMAND DESCRIPTION
-----------------------------------------------------------------------------------
--------
/etc/init.d/init.crs start START CRS PROCESS (AS ROOT USER)
srvctl start asm -n NODE1 START ASM INSTANCE on node 1
srvctl start asm -n NODE2 START ASM INSTANCE on node 2
srvctl start database -d ORCL START DATABASE
srvctl start instance -d ORCL -i ORCL1 START first INSTANCE (skip it if
running 'start database' as that will start both instances)
srvctl start instance -d ORCL -i ORCL2 START second INSTANCE (skip it if
running 'start database', as taht will start both instances)
srvctl start nodeapps -n NODE1 START NODEAPPS on NODE1
srvctl start nodeapps -n NODE2 START NODEAPPS ON NODE2
STOP SEQUENCE
COMMAND DESCRIPTION
-----------------------------------------------------------------------------------
-------
srvctl stop database -d ORCL STOP DATABASE
srvctl stop instance -d ORCL -i ORCL1 STOP first INSTANCE (skip it if running
'stop database' as that will stop both instances)
srvctl stop instance -d ORCL -i ORCL2 STOP second INSTANCE (skip it if running
'stop database' as that will stop both instances)
srvctl stop asm -n NODE1 STOP ASM INSTANCES on NODE 1 (In 11G , we
have OCR on ASM so we cannot stop ASM, but if you have OCR in NON-ASM you should
stop it)
srvctl stop asm -n NODE2 STOP ASM INSTANCES on NODE 2 (In 11G , we
have OCR on ASM so we cannot stop ASM, but if you have OCR in NON-ASM you should
stop it)
srvctl stop nodeapps -n NODE1 STOP NODEAPPS on NODE 1
srvctl stop nodeapps -n NODE2 STOP NODEAPPS on NODE 2
/etc/init.d/init.crs stop STOP CRS PROCESSES (AS ROOT USER)
COMMAND DESCRIPTION
---------------------------------------------------------------------
crsctl status resource -t Clusterware Resource Status Check
srvctl status database -d ORCL STATUS OF DATABASE
srvctl stop listener -l LISTENER_NAME STOP A LISTENER
srvctl start listener -l LISTENER_NAME START A LISTENER
crsctl stop has stop all the clusterware services/
resources on specific node (including DB and listener) (run as root)
crsctl start has start all the clusterware services/
resources on specific node (including DB and listener) (run as root)
crsctl stop cluster -all to stop csr services on all nodes of
clusterware (run as root)
crsctl start cluster -all to start crs services on all nodes of
clusterware (run as root)
crsctl check has to check if ohasd is running/ stopped
(run as root)
crsctl enable has enable Oracle High Availability
Services autostart (run as root)
crsctl disable has disable Oracle High Availability
Services autostart (run as root)
crsctl config has check if Oracle High Availability
Services autostart is enabled/ disabled (run as root)
srvctl status nodeapps to check the status of services on all
nodes
crsctl stop crs stop all the clusterware services/
resources ON THAT NODE! (run as root)
crsctl start crs start all the clusterware services/
resources ON THAT NODE! (run as root)
cluvfy comp scan -verbose Verifying scan status scan_listener
5. OCR logs : OCR logs (ocrdump, ocrconfig, ocrcheck) log files are stored in
$ORA_CRS_HOME/log/<hostname>/client/ directory.
7. RACG logs : The high availability trace files are stored in two locations
$ORA_CRS_HOME/log/<hostname>/racg/ and in $ORACLE_HOME/log/<hostname>/racg/
directories.
RACG contains log files for node applications such as VIP, ONS etc.
Each RACG executable has a sub directory assigned exclusively for that executable.
racgeut : $ORA_CRS_HOME/log/<hostname>/racg/racgeut/
racgevtf : $ORA_CRS_HOME/log/<hostname>/racg/racgevtf/
racgmain : $ORA_CRS_HOME/log/<hostname>/racg/racgmain/
racgeut : $ORACLE_HOME/log/<hostname>/racg/racgeut/
racgmain: $ORACLE_HOME/log/<hostname>/racg/racgmain/
racgmdb : $ORACLE_HOME/log/<hostname>/racg/racgmdb/
racgimon: $ORACLE_HOME/log/<hostname>/racg/racgimon
step.2
Once backup is done, transfer the backupsets to target server where we want to
restore/clone
step.3
On Target Standalone Server Ensure that the server(UAT) meets oracle software
requirements.
a. os must be certified
b. kernel parameters must be set
c. mandatory os packages should be installed
d. os limits should be set
e. create the group/user (dba/oracle)
5. set the environment variables (ORACLE_HOME, ORACLE_BASE, ORACLE_SID, PATH,
LD_LIBRARY_PATH etc)
step.4
step.5
Create pfile on source and copied to target and remove the RAC related and other
parameters
ex:
*.audit_file_dest='/oracle/app/oracle/admin/qaclone/adump'
*.cluster_database=FALSE
*.control_files='+DATA/qaclone/control01.ctl','+DATA/qaclone/control02.ctl'
*.db_file_name_convert='+PRODDATA/prod','+DATA/qaclone'
*.db_name='qaclone'
*.diagnostic_dest='/oracle/app/oracle/admin/qaclone'
*.log_file_name_convert='+PRODDATA/prod','+DATA/qaclone
step.6
step.7
step.8
step.9
check that you can access the source database, by copying tnsEntries from the
SOURCE Node to the Target node tnsnames.ora file.
step.10
On the target node, connect to the RMAN using the following command, where PROD is
your source database, and CLONE is the target database:
RMAN>
step.11
GROUP#
----------
3
4
Database altered.
Database altered.
Database altered.
step.12
optional:
_no_recovery_through_resetlogs=TRUE
undo_management to =MANUAL
undo_tablespace=UNDOTBS1
_no_recovery_through_resetlogs=TRUE
This seems to be Bug 4355382 and its expected while doing RAC Restore/Recovery.
##########################
# Solution
##########################
Now opened the database with RESETLOGS options and it worked. This parameter tells
oracle not to do any recovery while doing this resetlogs operation.After Opening
the database, the parameter can be removed from the pfile.
http://sandeepmagdum.blogspot.in/p/rac-to-nonrac-clonning.html
http://oracledba102.blogspot.in/2012/04/how-to-clone-oracle-home-rac-to-non-
rac.html
rconfig:
http://sateeshv-dbainfo.blogspot.in/2015/09/rconfig-frequently-asked-question.html
https://zakkiahmed.wordpress.com/2010/08/04/convert-11gr2-non-rac-database-to-rac-
database-using-rconfig/
STOP SEQUENCE
COMMAND DESCRIPTION
-----------------------------------------------------------------------------------
-------
srvctl stop database -d ORCL STOP DATABASE
srvctl stop instance -d ORCL -i ORCL1 STOP first INSTANCE (skip it if running
'stop database' as that will stop both instances)
srvctl stop instance -d ORCL -i ORCL2 STOP second INSTANCE (skip it if running
'stop database' as that will stop both instances)
srvctl stop asm -n NODE1 STOP ASM INSTANCES on NODE 1 (In 11G , we
have OCR on ASM so we cannot stop ASM, but if you have OCR in NON-ASM you should
stop it)
srvctl stop asm -n NODE2 STOP ASM INSTANCES on NODE 2 (In 11G , we
have OCR on ASM so we cannot stop ASM, but if you have OCR in NON-ASM you should
stop it)
srvctl stop nodeapps -n NODE1 STOP NODEAPPS on NODE 1
srvctl stop nodeapps -n NODE2 STOP NODEAPPS on NODE 2
/etc/init.d/init.crs stop STOP CRS PROCESSES (AS ROOT USER)
START SEQUENCE
COMMAND DESCRIPTION
-----------------------------------------------------------------------------------
--------
/etc/init.d/init.crs start START CRS PROCESS (AS ROOT USER)
srvctl start asm -n NODE1 START ASM INSTANCE on node 1
srvctl start asm -n NODE2 START ASM INSTANCE on node 2
srvctl start database -d ORCL START DATABASE
srvctl start instance -d ORCL -i ORCL1 START first INSTANCE (skip it if
running 'start database' as that will start both instances)
srvctl start instance -d ORCL -i ORCL2 START second INSTANCE (skip it if
running 'start database', as taht will start both instances)
srvctl start nodeapps -n NODE1 START NODEAPPS on NODE1
srvctl start nodeapps -n NODE2 START NODEAPPS ON NODE2
g:\prints\rac\rac_installation.txt
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
rac installation:
++++++++++++++++
g:\prints\rac\RAC-machin-IPs.txt
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
CLUSTER= 1 ======>> RAC 1 => IP = 147.43.0.1
PRIV= 192.168.2.1
VIP = 147.43.0.101
g:\prints\rac\rconfig.txt
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
RCONFIG : Frequently Asked Questions (Doc ID 387046.1)
PURPOSE
This FAQ explains the "rconfig" tool introduced in Oracle Database Version 10.2.
This is also used extensively when Oracle E-Business Suite 11i Customers convert
their Single Instance Database to RAC.
What is rconfig ?
rconfig is a command line tool introduced in Oracle Database 10g R2 to convert
Single-Instance 10g R2 Database to RAC(Real Application Cluster). The other option
is to use Convert to RAC option on the single-instance database target of Oracle
Enterprise Manager Grid Control.
After running RCONFIG in Convert verify="YES" Mode, if you get any fatal error(e.g.
Disk space not available, Some issue with 10g Parameters, Shared Storage Issues)
which exits the rconfig tool & stops the conversion. You can restart the rconfig
tool again by running the command " $ ./rconfig convert.xml ". This will perform
the clean-up operation on converted Instance or will delete the files created by
earlier run. It again starts the process of conversion from start i.e. taking rman
backup of Single Instance. After the conversion you can check the filesystem to
verify if there is anything left from the previous failed run.
ConvertToRAC_AdminManaged.xml
ConvertToRAC_PolicyManaged.xml
==================================================================
1. DBCA
2. rconfig
3. Enterprise Manager
All the 3 have their own benefits and can be used to suit one�s needs. My recent
work involved the conversion of non-RAC single instance database to RAC database
using rconfig, although I�ve tested all the 3 methods but concluded on rconfig.
Pre-requisites:
For ASM, refer to Oracle Database Storage Administrator�s Guide11g Release 2 (11.2)
For configuring shared storage, refer to Configuring Storage for Grid
Infrastructure for a Cluster and Oracle Real Application Clusters (Oracle RAC)
(Linux)
See also Oracle Database Administrator�s Guide 11g Release 2 (11.2)
2. A clustered Grid Infrastructure install with at least one Scan listener address.
See
1
srvctl add listener -p 1522
After conversion, you can reconfigure the listener as required.
http://download.oracle.com/docs/cd/E11882_01/install.112/e10813/racinstl.htm#BABJGB
HB
/u01/app/oracle/product/11.2.0/db_2
$ORACLE_HOME/assistants/rconfig/sampleXMLs
...
<!--Specify the list of nodes that should have rac instances running for the Admin
Managed Cluster Database. LocalNode should be the first node in this nodelist. -->
...
<!--Specify Database Area Location to be configured for rac database.If this field
is left empty, current storage will be use d for rac database. For CFS, this field
will have directory path. -->
+DATA
4. Move the spfile to the shared location, in this case the Single Instance
Database was hosted on file system, in this process we will move the datafiles from
file system storage to ASM.
1
SQL>create spfile='+DATA/TEST/spfiletest.ora' from pfile;
You can check if the file is created through �asmcmd�
1
spfile='+DATA/TEST/spfiletest.ora'
6. Restart the Database
7. Now lets test if �rconfig� is ready for conversion, navigate to $ORACLE_HOME/bin
and issue the following command
1
$./rconfig $ORACLE_HOME/assistants/rconfig/sampleXMLs/ConvertToRAC_AdminManaged.xml
The above command validates( as we�ve set convert=�ONLY�) if rconfig is ready for
conversion. If the output throws any error, diagnose and troubleshoot to fix the
issue. Refer to the following output for successful validation:
Operation Succeeded
..
8. Now are we are ready for conversion, edit the xml file
�ConvertToRAC_AdminManaged.xml� and change:
from:
1
2
3
..
<n:Convert verify="ONLY">
..
to
1
2
3
..
<n:Convert verify="YES">
..
9. Perform the conversion
1
$./rconfig $ORACLE_HOME/assistants/rconfig/sampleXMLs/ConvertToRAC_AdminManaged.xml
The conversion will take some time to complete. The progress can be monitored from
the logs located at $ORACLE_BASE/cfgtoollogs/rconfig
10. Once the conversion is complete you�d get a similar message in step 7.
11. Perform sanity checks and tweak the listener to suit your needs.
That sums up the procedure to convert Single Instance Oracle Database to RAC
database. Please do share your thoughts and comments.
http://msutic.blogspot.in/2014/05/convert-12cr1-non-rac-database-to-rac.html
http://samiora.blogspot.in/2013/06/
#> $GI_HOME/bin/olsnodes -s -t
9. From any node not being removed delete Clusterware from the node (root user):
12. From any existing node to remain, update the Clusterware with existing nodes:
13. Verify the node has been removed and the remaining nodes are valid:
g:\prints\rac\voting disk.txt
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Voting Disk
What does it contain, who updates it, how is it used, where is it stored and so on?
Voting disk manage information about node membership. Each voting disk must be
accessible by all nodes in the cluster.If any node is not passing heat-beat across
other note or voting disk, then that node will be evicted by Voting disk, hence the
voting disk a key component of clusterware and its failure can lead to
inoperability of the cluster.
In a RAC at any point in time the clusterware must know which nodes are member of
the cluster so that
Hence, there must be a way by which clusterware can find out about the node
membership.
That is where voting disk comes into picture. It is the place where nodes mark
their attendance. Consider an analogy where a manager wants to find out which of
his subordinates are present. He can just check the attendance register and assign
them their tasks accordingly. Similarly, CSSD process on every node makes entries
in the voting disk to ascertain the membership of that node. The voting disk
records node membership information. If it ever fails, the entire clustered
environment for Oracle 11g RAC will be adversely affected and a possible outage may
result if the vote disks is/are lost. Also, in a cluster communication between
various nodes is of paramount importance. Nodes which can�t communicate with other
nodes should be evicted from the cluster. While marking their own presence, all
the nodes also register the information about their communicability with other
nodes in voting disk . This is called network heartbeat. CSSD process in each RAC
node maintains its heart beat in a block of size 1 OS block, in the hot block of
voting disk at a specific offset. The written block has a header area with the
node name.
The heartbeat counter increments every second on every write call. Thus heartbeat
of various nodes is recorded at different offsets in the voting disk. In addition
to maintaining its own disk block, CSSD processes also monitors the disk blocks
maintained by the CSSD processes running in other cluster nodes. Healthy nodes will
have continuous network and disk heartbeats exchanged between the nodes. Break in
heart beat indicates a possible error scenario.If the disk block is not updated in
a short timeout period, that node is considered unhealthy and may be rebooted to
protect the database information. In this case , a message to this effect is
written in the kill block of the node. Each node reads its kill block once per
second, if the kill block is overwritten node commits suicide.
During reconfig (join or leave) CSSD monitors all nodes and determines whether a
node has a disk heartbeat, including those with no network heartbeat. If no disk
heartbeat is detected then node is declared as dead.
It maintains and consists of important details about the cluster nodes membership,
such as
The Voting Disk Files are used by Oracle Clusterware by way of a health check .
- in concert with other Cluster components such as CRS to shut down, fence, or
reboot either single or multiple nodes whenever network communication is lost
between any node within the cluster, in order to prevent the dreaded split-brain
condition in which two or more instances attempt to control the RAC database. It
thus protects the database information.
- It will be used by the CSS daemon to arbitrate with peers that it cannot see over
the private interconnect in the event of an outage, allowing it to salvage the
largest fully connected subcluster for further operation. It checks the voting
disk to determine if there is a failure on any other nodes in the cluster. During
this operation, NM will make an entry in the voting disk to inform its vote on
availability. Similar operations are performed by other instances in the cluster.
The three voting disks configured also provide a method to determine who in the
cluster should survive. For example, if eviction of one of the nodes is
necessitated by an unresponsive action, then the node that has two voting disks
will start evicting the other node. NM alternates its action between the heartbeat
and the voting disk to determine the availability of other nodes in the cluster.
The Voting disk is the key communication mechanism within the Oracle Clusterware
where all nodes in the cluster read and write heartbeat information. CSSD processes
(Cluster Services Synchronization Daemon) monitor the health of RAC nodes
employing two distinct heart beats: Network heart beat and Disk heart beat. Healthy
nodes will have continuous network and disk heartbeats exchanged between the
nodes. Break in heart beat indicates a possible error scenario.
There are few different scenarios possible with missing heart beats:
In addition, with numerous nodes, there are other possible scenarios too. Few
possible
scenarios:
1. Nodes have split in to N sets of nodes, communicating within the set, but not
with members in other set.
Nodes with quorum will maintain active membership of the cluster and other node(s)
will be fenced/rebooted.
A node must be able to access more than half of the voting disks.
A node must be able to access more than half of the voting disks at any time. For
example,let�s have a two node cluster with an even number of let�s say 2 voting
disks. Let Node1 is able to access voting disk1 and Node2 is able to access voting
disk2 . This means that there is no common file where clusterware can check the
heartbeat of both the nodes.
Hence, if we have 2 voting disks, all the nodes in the cluster should be able to
access both the voting disks. If we have 3 voting disks and both the nodes are able
to access more than half i.e. 2 voting disks, there will be at least on disk which
will be accessible by both the nodes. The clusterware can use that disk to check
the heartbeat of both the nodes. Hence, each node should be able to access more
than half the number of voting disks. A node not able to do so will have to be
evicted from the cluster to maintain the integrity of the cluster . After the
cause of the failure has been corrected and access to the voting disks has been
restored, you can instruct Oracle Clusterware to recover the failed node and
restore it to the cluster.
Loss of more than half your voting disks will cause the entire cluster to fail !!
The Voting Disk is a shared disk that will be accessed by all member nodes in the
cluster during an operation. Hence, the voting disks must be on shared accessible
storage .
- You should plan on allocating 280MB for each voting disk file.
Cluster filesystem
� In the event that the disk containing the voting disk fails, Oracle ASM will
choose another disk on which to store this data.
� you can reduce the complexity of managing disk partitions for voting disks during
Oracle
Clusterware installations.
Hence, you should maintain multiple copies of the voting disks on separate disk
LUNs so that you eliminate a Single Point of Failure (SPOF) in your Oracle 11g RAC
configuration.
i.e. > 1
i.e.= 2
Hence, to be able to tolerate the loss of one disk, we should have 3 copies of the
voting disk on a diskgroup with normal redundancy . So, a normal redundancy
diskgroup having voting disk should have minimum 3 disks in it.
� Ensure that all the nodes participating in the cluster have read/write
permissions on disks.
� You can have up to a maximum of 15 voting disks. However, Oracle recommends that
you do not go beyond five voting disks.
In previous versions of Oracle Clusterware you needed to backup the voting disks
with the dd command. Starting with Oracle Clusterware 11g Release 2 you no longer
need to backup the voting disks. The voting disks are automatically backed up as a
part of the OCR. In fact,
Oracle explicitly indicates that you should not use a backup tool like dd to backup
or restore voting disks. Doing so can lead to the loss of the voting disk.
Although the Voting disk contents are not changed frequently, you will need to back
up the Voting disk file every time
A node in the cluster must be able to access more than half of the voting disks
at any time in order to be able to tolerate a failure of n voting disks. Therefore,
it is strongly recommended that you configure an odd number of voting disks such as
3, 5, and so on.
� we can see that only one copy of the voting disk is there on data diskgroup which
has external redundancy. As I mentioned earlier, Oracle writes the voting devices
to the underlying disks at pre-designated locations so that it can get the contents
of these files when the cluster starts up.