Sie sind auf Seite 1von 92

Subject: RAC: Frequently Asked Questions

Doc ID: Note:220970.1 Type: FAQ


Last Revision Date: 16-MAY-2007 Status: PUBLISHED

RAC Frequently Asked Questions


General RAC

• Why does netca always creates the listener which listens to public ip and not VIP
only?
• Ct wants to use rconfig to convert a single instance to RAC but ct is using raw
devices in RAC. Does rconfig support RAW ?
• Can we designate the place of archive logs on both ASM disk and regular file
system, when we use SE RAC?
• WARNING: No cluster interconnect has been specified. I get this error starting
my RAC database, what do I do?
• Is it supported to install CRS and RAC as different users.
• I have changed my spfile with alter system set <parameter_name> =....
scope=spfile. The spfile is on ASM storage and the database will not start.
• Is it difficult to transition from Single Instance to RAC?
• What are the dependencies between OCFS and ASM in Oracle Database 10g ?
• What is Cache Fusion and how does this affect applications?
• Do we have to have Oracle RDBMS on all nodes?
• What software is necessary for RAC? Does it have a separate installation CD to
order?
• What kind of HW components do you recommend for the interconnect?
• Is rcp and/or rsh required for normal RAC operation ?
• Are there any suggested roadmaps for implementing a new RAC installation?
• Are there any issues for the interconnect when sharing the same switch as the
public network by using VLAN to separate the network?
• Can my customer use Veritas Agents to manage their RAC database on Unix with
SFRAC installed?
• How do I check for network problems on my interconect?
• Is there a need to renice LMS processes in Oracle RAC 10g Release 2?
• I had a 3 node RAC. One of the nodes had to be completely rebuilt as a result of a
problem. As there are no backups, What is the proper procedure to remove the 3rd
node from the cluster so it can be added back in?
• Where can I find a list of supported solutions to ensure NIC availability (for the
interconnect) per platform?
• What combinations of Oracle Clusterware, RAC and ASM versions can I use?
• Is relink required for CRS_HOME after OS upgrade?
• Does Oracle Clusterware or Real Application Clusters support heterogeneous
platforms?
• Is Infiniband supported for the RAC interconnect?
• Can I run more than one clustered database on a single RAC cluster?
• What is Standard Edition RAC?
• Can I run 9i RAC and RAC 10g in the same cluster?
• I could not get the user equivalence check to work on my Solaris 10 server when
trying to install 10.2.0.1 Oracle Clusterware. The install ran fine without issue. <<
Message: Result: User equivalence check failed for user "oracle". >>
• Does changing uid or gid of the Oracle User affect Oracle Clusterware?
• How many NICs do I need to implement RAC?
• Can we output the backupset onto regular file system directly (not onto flash
recovery area) using RMAN command, when we use SE RAC?
• Should the SCSI-3 reservation bit be set for our Oracle Clusterware only
installation?
• A client is a new RAC user and are using it in conjunction with BEA weblogic.
Can they use Connection Load Balancing and Services? What about FCF, FAN,
RCLB?
• Why is validateUserEquiv failing during install (or cluvfy run)?
• How can a NAS storage vendor certify their storage solution for RAC ?
• What are the restrictions on the SID with a RAC database? Is it limited to 5
characters?
• What storage is supported with Standard Edition RAC?
• Can I use iSCSI storage with my RAC cluster?
• What would you recomend to customer, Oracle clusterware or Vendor
Clusterware (I.E. MC Service Guard, HACMP, Sun Cluster, Veritas etc.) with
Oracle Database 10g Real Application Clusters?
• Can I use RAC in a distributed transaction processing environment?
• Is it a good idea to add anti-virus software to my RAC cluster?
• When configuring the NIC cards and switch for a GigE Interconnect should it be
set to FULL or Half duplex in RAC?

RAC Assistance

• How do I use DBCA in silent mode to set up RAC and ASM?


• My customer has an XA Application with a RAC Database, can I do Load
Balancing across the RAC instances?

High Availability

• How does OCR mirror work? What happens if my OCR is lost/corrupt?


• In Solaris 10, do we need Sun Clusterware to provide redundancy for the
interconnect and multiple switches?
• Why do we have a Virtual IP (VIP) in 10g? Why does it just return a dead
connection when its primary node fails?
• If I use Services with Oracle Database 10g, do I still need to set up Load
Balancing ?
• Can RMAN backup Real Application Cluster databases?
• I am receiving an ORA-29740 error. What should I do?
• I am using shared services which the following set in init.ora SQL> show
parameters dispatchers=(protocol=TCP)(listener=listen ers_nl01)(con=500)
(serv=oltp). I stopped my service with srvctl stop service but it is still registered
with the listener and accepting connections. Is this expected?
• How do I configure FCF with BPEL so I can use RAC 10g in the backend?
• The client gets this error message in Production in the ons.log file every minute or
so: 06/11/10 10:11:14 [2] Connection 0,129.86.186.58,6200 SSL handshake
failed 06/11/10 10:11:14 [2] Handshake for 0,129.86.186.58,6200: nz error =
29049 interval = 0 (180 max)
• Is it possible to use SVRCTL start database with a user account other than oracle (
that is other than the owner of the oracle software)?
• After executing DBMS_SERVICE.START_SERVICE, the service resource
remains OFFLINE status when confirming it with crs_stat. Is that expected
behavior ?
• What are my options for load balancing with RAC? Why do I get an uneven
number of connections on my instances?
• With three primary load balancing options (client-side connect-time LB, server-
side connect-time LB, and the runtime connection load balancing) Is it fair to say
Runtime Connection Load Balancing is the only option to leverage FAN up/down
events?
• How can a customer mask the change in their clustered database configuration
from their client or application? (I.E. So I do not have to change the connection
string when I add a node to the RAC database)
• What is Server-side Transparent Application Failover (TAF) and how do I use it?
• What is CLB_GOAL and how should I set it?
• Can our 10g VIP fail over from NIC to NIC as well as from node to node ?
• What does the Virtual IP service do? I understand it is for failover but do we need
a separate network card? Can we use the existing private/public cards? What
would happen if we used the public ip?
• What do the VIP resources do once they detect a node has failed/gone down? Are
the VIPs automatically acquired, and published, or is manual intervention
required? Are VIPs mandatory?

High Availability -- FAN/FCF

• Can I use the 10.2 JDBC driver with 10.1 database for FCF?
• What clients provide integration with FAN through FCF?
• Can I use TAF and FAN/FCF?
• How does the datasource properties initialLimit, minLimit, and maxLimit affect
Fast Connection Failover processing with JDBC?
• Do I need to install the ONS on all my mid-tier serves in order to enable JDBC
Fast Connection Failover (FCF)?
• Will FAN/OCI work with Instant Client?
• What type of callbacks are supported with OCI when using FAN/FCF?
• Does FCF for OCI react to FAN HA UP events?
• Can I use FAN/OCI with Pro*C?
• Do I have to link my OCI application with a thread library? Why?

Scalability

• I am seeing the wait events 'ges remote message', 'gcs remote message', and/or
'gcs for action'. What should I do about these?
• What are the changes in memory requirements from moving from single instance
to RAC?
• How can I validate the scalability of my shared storage? (Tightly related to RAC /
Application scalability)
• How many nodes are supported in a RAC Database?
• How do I measure the bandwidth utilization of my NIC or my interconnect?
• Does Database blocksize or tablespace blocksize affect how the data is passed
across the interconnect?
• What are my options for setting the Load Balancing Advisory GOAL on a
Service?
• What is the Load Balancing Advisory?
• What is Runtime Connection Load Balancing?
• How do I enable the load balancing advisory?

Manageability

• I found in 10.2 that the EM "Convert to Cluster Database" wizard would always
fall over on the last step where it runs emca and needs to log into the new cluster
database as dbsnmp to create the cluster database targets etc. I changed the
password for the dbsnmp account to be dbsnmp (same as username) and it worked
OK. Is this a known issue?
• What storage option should I use for RAC 10g on Linux? ASM / OCFS / Raw
Devices / Block Devices / Ext3 ?
• How do I stop the GSD?
• What is the purpose of the gsd service in Oracle 9i RAC?
• How should I deal with space management? Do I need to set free lists and free list
groups?
• I was installing RAC and my Oracle files did not get copied to the remote node(s).
What went wrong?
• I have 2 clusters named "crs" (the default), how do I get Grid Control to recognize
them as targets?
• If I am using Vendor Clusterware such as Veritas, IBM, Sun or HP, do I still need
Oracle Clusterware to run Oracle RAC 10g?
• If using plsql native code, the plsql_native_library_dir needs to be defined. In
RAC environement, must the directory be in the shared storage?
• How do I determine whether or not an OneOff patch is "rolling upgradeable"?
• Does RAC work with NTP (Network Time Protocol)?
• What is the Cluster Verification Utiltiy (cluvfy)?
• What versions of the database can I use the cluster verification utility (cluvfy)
with?
• What are the implications of using srvctl disable for an instance in my RAC
cluster? I want to have it available to start if I need it but at this time to not want
to run this extra instance for this database.

Platform Specific

• How many nodes can be had in an HP-UX/Solaris/AIX/Windows/Linux cluster?


• Is crossover cable supported as an interconnect with RAC on any platform ?
• What is Oracle's position with respect to supporting RAC on Polyserve CFS?
• Is it possible to run RAC on logical partitions (i.e. LPARs) or virtual separate
servers.
• Can the Oracle Database Configuration Assistant (DBCA) be used to create a
database with Veritas DBE / AC 3.5?
• How do I check RAC certification?
• Where I can find information about how to setup / install RAC on different
platforms ?
• Is Veritas Storage Foundation supported with RAC?
• Is RAC on VMWare supported?

Platform Specific -- Linux

• Is 3rd Party Clusterware supported on Linux such as Veritas or Redhat?


• Can you have multiple RAC $ORACLE_HOME's on Linux?
• After installing patchset 9013 and patch_2313680 on Linux, the startup was very
slow
• Is CFS Available for Linux?
• Oracle Clusterware fails to start after a reboot due to permissions on raw devices
reverting to default values, How to fix?
• How do I configure my RAC Cluster to use the RDS Infiniband?
• Can RAC 10g and 9i RAC be installed and run on the same physical Linux
cluster?
• Is the hangcheck timer still needed with Oracle RAC 10g?
• My customer is about to install 10202 clusterwere on new Linux machinges. He is
getting "No ORACM running" error when run rootpre.sh and exited? Should he
worry about this message?
• Customer did not load the hangcheck-timer before installing RAC, Can the
customer just load the hangcheck-timer ?
• Is OCFS2 certified with RAC 10g?
• A customer installed 10g R2 on Linux RH4 Update 2, 2.6.9-22.ELsmp #1 SMP
x86_64 GNU/Linux, and got the error Error in invoking target 'all_no_orcl'.
Customer ignored the error and the install succeeded without any other errors and
oracle pparently worked fine. What should they do?
Because of compatibility with their storage array (EMC DMX with Powerpath
4.5) they must use update 2. Oracle install guide states that RH4 64 bits update 1
"or higher" should be used for 10g R2.
• Where can I find more information about hangcheck-timer module on Linux ?
And how do we configure hangcheck-timer module ?
• How to configure bonding on Suse SLES8.
• How to configure bonding on Suse SLES9.

Platform Specific -- Solaris

• Client is running Veritas cluster on a SunOS 2.9. When we ran the ran the 10.2.0
installer it did not discover the nodes but with 9i it was able to discover both the
nodes. Is there anything specific to be done for 10.2.0 db install?
• Can I configure IPMP in Actie/Active to increase bandwidth of my interconnect?
• Does Oracle Support RAC with Solaris 10 Containers (aka Zones)?
• Does Sun Solaris have a multipathing solution ?

Platform Specific -- HP-UX

• Is HMP supported with 10g on all HP platforms ?

Platform Specific -- Windows

• Does the Oracle Cluster File System (OCFS) support network access through
NFS or Windows Network Shares?
• My customer has a failsafe cluster installed, what are the benefits of moving their
system to RAC?
• When I try to login to the +ASM2 on node2 with asmcmd (after setting
ORACLE_HOME and ORACLE_SID correctly) I get: ORA-01031: insufficient
privileges (DBD ERROR: OCI SessionBegin). When I try to login to +ASM2
using sqlplus (connect / as sysdba) I get the same ORA-01031: insufficient
privileges. When I try to login to +ASM2 using sqlplus (connect sys/passwd as
sysdba) I get connected successfully.
• Can I run my 9i RAC and RAC 10g on the same Windows cluster?
• My customer wants to understand what type of disk caching they can use with
their Windows RAC Cluster, the install guide tells them to disable disk caching?

Platform Specific -- IBM AIX

• Is HACMP needed for RAC on AIX 5.2 using GPFS file system?
• Do I need HACMP/GPFS to store my OCR/Voting file on a shared device.
• Is VIO supported with RAC on IBM AIX?

Platform Specific -- IBM-z/OS (Mainframe)

• Can I run Oracle RAC 10g on my IBM Mainframe Sysplex environment (z/OS)?

Applications & RAC


• Is Oracle Application Server integrated with FAN and FCF?
• Can I use Oracle Clusterware for failover of the SAP Enqueue and VIP services
when running SAP in a RAC environment?
• Are Oracle Applications certified with RAC?

Diagnosibility

• How do I gather all relevant Oracle and OS log/trace files in a RAC cluster to
provide to Support?
• What are the cdmp directories in the background_dump_dest used for?

EBusiness Suite with RAC

• What is the optimal migration path to be used while migrating the E-Business
suite to RAC?
• Is the Oracle E-Business Suite (Oracle Applications) certified against RAC?
• Can I use TAF with e-Business in a RAC environment?
• How to configure concurrent manager in a RAC environment?
• Should functional partitioning be used with Oracle Applications?
• Which e-Business version is prefereable?
• Can I use Automatic Undo Management with Oracle Applications?

Clustered File Systems

• Can I use OCFS with SE RAC?


• What are the maximum number of nodes under OCFS on Linux ?
• Where can I find documentation on OCFS ?
• What files can I put on Linux OCFS?
• What is the maximum number of nodes I can have in my cluster if I am using
OCFS2?
• Is Red Hat GFS(Global File System) is certified by Oracle for use with Real
Application Clusters?
• Is Sun QFS supported with RAC? What about Sun GFS?
• Is Linux OCFS2 (OCFS version 2) supported with RAC?

Oracle Clusterware

• Customer is hitting bug 4462367 with an error message saying low open file
descriptor, how do I work around this until the fix is released with the Oracle
Clusterware Bundle for 10.2.0.3 or 10.2.0.4 is released?
• In the course of failure testing in an extended RAC environment we find entries in
the cssd logfile which indicate actions like 'diskShortTimeout set to (value)' and
'diskLongTimeout set to (value)'.
Can anyone please explain the meaning of these two timeouts in addition to
diskTimout?
• Can I run a 10.1.0.x database with Oracle Clusterware 10.2 ?
• Is it supported to rerun root.sh from the Oracle Clusterware installation ?
• My customer has noticed tons of log files generated under $CRS_HOME/log/
/client, is there any way automated way we can setup through Oralce Clusterware
to prevent/minimize/remove those aggressively generated files?
• Can I change the public hostname in my Oracle Database 10g Cluster using
Oracle Clusterware?
• Can I set up failover of the VIP to another card in the same machine or what do I
do if I have different network interfaces on different nodes in my cluster (I.E. eth0
on node1,2 and eth1 on node 3,4)?
• Is it possible to use ASM for the OCR and voting disk?
• Is it supported to allow 3rd Party Clusterware to manage Oracle resources
(instances, listeners, etc) and turn off Oracle Clusterware management of these?
• What is the High Availability API?
• How to move the OCR location ?
• During Oracle Clusterware installation, I am asked to define a private node name,
and then on the next screen asked to define which interfaces should be used as
private and public interfaces. What information is required to answer these
questions?
• Can I change the name of my cluster after I have created it when I am using
Oracle Clusterware?
• Which processes access the OCR ?
• What happens if I lose my voting disk(s)?
• Why does Oracle Clusterware use an additional 'heartbeat' via the voting disk,
when other cluster software products do not?
• Why does Oracle still use the voting disks when other cluster sofware is present?
• How do I identify the voting file location ?
• How much I/O activity should the voting disk have?
• What is the voting disk used for?
• How do I use multiple network interfaces to provide High Availability and/or
Load Balancing for my interconnect with Oracle Clusterware?
• Does Oracle Clusterware have to be the same or higher release than all instances
running on the cluster?
• Can I use Oracle Clusterware to monitor my EM Agent?
• Can the Network Interface Card (NIC) device names be different on the nodes in
a cluster, for both public and private?
• Can I configure HP's Autoport aggregation for NIC Bonding after the install? (i.e.
not present beforehand)
• When ct run the command 'onsctl start' receives the message "Unable to open
libhasgen10.so". Any idea why the message "unable to open libhasgen10.so" ?
• What are the IP requirements for the private interconnect?
• How to Restore a Lost Voting Disk used by Oracle Clusterware 10g
• How can I register the listener with Oracle Clusterware in RAC 10g Release 2?
• How is the voting disk used by Oracle Clusterware?
• Does Oracle Clusterware support application vips?
• Why is the home for Oracle Clusterware not recommended to be subdirectory of
the Oracle base directory?
• Can I use Oracle Clusterware to provide cold failover of my 9i or 10g single
instance Oracle Databases?
• How do I put my application under the control of Oracle Clusterware to achieve
higher availability?
• How do I protect the OCR and Voting in case of media failure?
• With Oracle Clusterware 10g, how do you backup the OCR?
• Does the hostname have to match the public name or can it be anything else?
• Is it a requirement to have the public interface linked to ETH0 or does it only
need to be on a ETH lower than the private interface?: - public on ETH1 - private
on ETH2
• How do I restore OCR from a backup? On Windows, can I use ocopy?
• What should the permissions be set to for the voting disk and ocr when doing a
RAC Install?

Streched/Extended RAC -- No Sub Category

• Can I use ASM to mirror Oracle data in an extended RAC environment?


• How should voting disks be implemented in an extended cluster environment?
Can I use standard NFS for the third site voting disk?
• What are the network requirements for an extended RAC cluster?
• Can I use ASM as mechanism to mirror the data in an Extended RAC cluster?
• Can a customer use SE RAC to implement an "Extended RAC Cluster" ?

Cluster Verification Utility (CVU) -- No Sub Category

• What are the default values for the command line arguments?
• How do I check the Oracle Clusterware stack and other sub-components of it?
• Is there a way to verify that the Oracle Clusterware is working properly before
proceeding with RAC install?
• At what point cluvfy is usable? Can I use cluvfy before installing Oracle
Clusterware?
• What is CVU? What are its objectives and features?
• What is a stage?
• What is a component?
• What is nodelist?
• Do I have to be root to use CVU?
• What about discovery? Does CVU discover installed components?
• How do I report a(or tons of) bug?
• What are the requirements for CVU?
• How do I install 'cvuqdisk' package?
• How do I know about cluvfy commands? The usage text of cluvfy does not show
individual commands.
• Do I have to type the nodelist every time for the CVU commands? Is there any
shortcut?
• How do I get detail output of a check?
• How do I check network or node connectivity related issues?
• Can I check if the storage is shared among the nodes?
• How do I check whether OCFS is properly configured?
• How do I check user accounts and administrative permissions related issues?
• How do I check minimal system requirements on the nodes?
• Is there a way to compare nodes?
• Why the peer comparison with -refnode says passed when the group or user does
not exist?
• How do I turn on tracing?
• Where can I find the CVU trace files?
• Why cluvfy reports "unknown" on a particular node?
• What are the known issues with this release?
• When I run 10.2 CLUVFY on a system where RAC 10g Release 1 is running I get
following output:

Package existence check failed for "SUNWscucm:3.1".


Package existence check failed for "SUNWudlmr:3.1".
Package existence check failed for "SUNWudlm:3.1".
Package existence check failed for
"ORCLudlm:Dev_Release_06/11/04,_64bit_3.3.4.8_reentrant".
Package existence check failed for "SUNWscr:3.1".
Package existence check failed for "SUNWscu:3.1".
Checking this Solaris system I don't see those packages installed. Can I continue
my install?
• What is 'cvuqdisk' rpm? Why should I install this rpm?

Answers
I have changed my spfile with alter system set <parameter_name> =....
scope=spfile. The spfile is on ASM storage and the database will not
start.

How to recover:

In $ORACLE_HOME/dbs

. oraenv <instance_name>

sqlplus "/ as sysdba"

startup nomount

create pfile='recoversp' from spfile


/
shutdown immediate
quit
Now edit the newly created pfile to change the parameter to something
sensible.

Then:

sqlplus "/ as sysdba"

startup pfile='recoversp' (or whatever you called it in step one).

create spfile='+DATA/GASM/spfileGASM.ora' from pfile='recoversp'


/
N.B.The name of the spfile is in your original init<instance_name>.ora
so adjust to suit

shutdown immediate
startup
quit

Is it supported to install CRS and RAC as different users.

Yes, CRS and RAC can be installed as different users. The CRS user and the RAC user
must both have "oinstall" as their primary group, and the RAC user should be a member
of the OSDBA group.

WARNING: No cluster interconnect has been specified. I get this error


starting my RAC database, what do I do?

It simply means you do not have cluster_interconnects parameter set and nothing was set
in the OCR, so the private interconnect is picked at random by the database and hence the
warning...
You can either set cluster_interconnects parameter in the init.ora to the private
interconnect IP; OR play with oifcfg getif and setif (type oifcfg without anything for help
message)

$ oifcfg getif
eth0 138.2.236.0 global public
eth2 138.2.238.0 global cluster_interconnect

What does your output look like?

Note that if hardware is not identical you'll have to provide each node with it's own
correct value, if it's identical hardware you can use the -global switch.
Ct wants to use rconfig to convert a single instance to RAC but ct is using
raw devices in RAC. Does rconfig support RAW ?

No. rconfig supports ASM and shared file system only.

Can we designate the place of archive logs on both ASM disk and regular
file system, when we use SE RAC?

Yes, - customers may want to create a standby database for their SE RAC database so
placing the archive logs additionally outside ASM is OK.

Why does netca always creates the listener which listens to public ip and
not VIP only?

This is for backward compatibility with existing clients: consider pre-10g to 10g server
upgrade. If we made upgraded listener to only listen on VIP, then clients that didn't
upgrade will not be able to reach this listener anymore.

Do we have to have Oracle RDBMS on all nodes?

Each node of a cluster that is being used for a clustered database will typically have the
RDBMS and RAC software loaded on it, but not actual datafiles (these need to be
available via shared disk). For example, if you wish to run RAC on 2 nodes of a 4-node
cluster, you would need to install the clusterware on all nodes, RAC on 2 nodes and it
would only need to be licensed on the two nodes running the RAC database. Note that
using a clustered file system, or NAS storage can provide a configuration that does not
necessarily require the Oracle binaries to be installed on all nodes.

What kind of HW components do you recommend for the interconnect?


The general recommendation for the interconnect is to provide the highest bandwith
interconnect, together with the lowest latency protocol that is available for a given
platform. In practice, Gigabit Ethernet with UDP has proven sufficient in every case it
has been implemented, and tends to be the lowest common denominator across platforms.

Are there any suggested roadmaps for implementing a new RAC


installation?

Yes, Oracle Support recommends the following best practices roadmap to successfully
implement RAC:

A Smooth Transition to Real Application Clusters

The Purpose of this document is to provide a best practices road map to successfully
implement Real Application Clusters.

What is Cache Fusion and how does this affect applications?

Cache Fusion is a new parallel database architecture for exploiting clustered computers to
achieve scalability of all types of applications. Cache Fusion is a shared cache
architecture that uses high speed low latency interconnects available today on clustered
systems to maintain database cache coherency. Database blocks are shipped across the
interconnect to the node where access to the data is needed. This is accomplished
transparently to the application and users of the system. As Cache Fusion uses at most a 3
point protocol, this means that it easily scales to clusters with a large numbers of nodes.
For more information about cache fusion see the following links:

Additional Information can be found at:

Understanding 9i Real Application Clusters Cache Fusion

Cache Fusion in the Oracle Documentation

Is it difficult to transition from Single Instance to RAC?


If the cluster and the cluster software are not present, these components must be installed
and configured. The RAC option must be added using the Oracle Universal Installer,
which necessitates the existing DB instance must be shut down. There are no changes
necessary on the user data within the database. However, a shortage of freelists and
freelist groups can cause contention with header blocks of tables and indexes as multiple
instances vie for the same block. This may cause a performance problem and require
data partitioning. However, the need for these changes should be rare.

Recommendation: apply automatic space segment management to perform these changes


automatically. The free space management will replace the freelists and freelist groups
and is better. The database requires one Redo thread and one Undo tablespace for each
instance, which are easily added with SQL commands or with Enterprise Manager tools.

Datafiles will need to be moved to either a clustered file system (CFS) or raw devices so
that all nodes can access it. Also, the MAXINSTANCES parameter in the control file
must be greater than or equal to number of instances you will start in the cluster.

For more detailed information, please see Migrating from single-instance to RAC in the
Oracle Documentation

With Oracle Database 10g Release 2, $ORACLE_HOME/bin/rconfig tool can be used to


convert Single instance database to RAC. This tool takes in a xml input file and convert
the Single Instance database whose information is provided in the xml. You can run this
tool in "verify only" mode prior to performing actual conversion. This is documented in
the RAC admin book and a sample xml can be found
$ORACLE_HOME/assistants/rconfig/sampleXMLs/ConvertToRAC.xml. This tool only
supports databases using a clustered file system or ASM. You cannot use it with raw
devices. Grid Control 10g Release 2 provides a easy to use wizard to perform this
function.
Note: Please be aware that you may hit bug 4456047 (shutdown immediate hangs) as you
convert the database. The bug is updated with workaround and the w/a should is release
noted as well.

What are the dependencies between OCFS and ASM in Oracle Database
10g ?

In an Oracle RAC 10g environment, there is no dependency between Automatic Storage


Management (ASM) and Oracle Cluster File System (OCFS).
OCFS is not required if you are using Automatic Storage Management (ASM) for
database files. You can use OCFS on Windows( Version 2 on Linux ) for files that ASM
does not handle - binaries (shared oracle home), trace files, etc. Alternatively, you could
place these files on local file systems even though it's not as convenient given the
multiple locations.
If you do not want to use ASM for your database files, you can still use OCFS for
database files in Oracle Database 10g.
Please refer to ASM and OCFS Positioning

Is rcp and/or rsh required for normal RAC operation ?

rcp"" and ""rsh"" are not required for normal RAC operation. However ""rsh"" and
""rcp"" should to be enabled for RAC and patchset installation. In future releases, ssh
will be used for these operations.

What software is necessary for RAC? Does it have a separate installation


CD to order?

Real Application Clusters is an option of Oracle Database and therefore part of the Oracle
Database CD. With Oracle 9i, RAC is part of Oracle9i Enterprise Edition. If you install 9i
EE onto a cluster, and the Oracle Universal Installer (OUI) recognizes the cluster, you
will be provided the option of installing RAC. Most UNIX platforms require an OSD
installation for the necessary clusterware. For Intel platforms (Linux and Windows),
Oracle provides the OSD software within the Oracle9i Enterprise Edition release.

With Oracle Database 10g, RAC is an option of EE and available as part of SE. Oracle
provides Oracle Clusterware on its own CD included in the database CD pack.

Please check the certification matrix (Note 184875.1) or with the appropriate platform
vendor for more information.

@ Sent by Karin Brandauer

What is Standard Edition RAC?

With Oracle Database 10g, a customer who has purchased Standard Edition is allowed to
use the RAC option within the limitations of Standard Edition(SE). For licensing
restrictions you should read the Oracle Database 10g License Doc. At a high level this
means that you can have a max of 4 cpus in the cluster, you must use ASM for all
database files. Oracle Cluster File System (OCFS) is not supported for use with SE RAC.
NOTE: OCFS2 is not supported for any database related files with SE RAC.
Can I use iSCSI storage with my RAC cluster?

For iSCSI, Oracle has made the statement that, as a block protocol, this technology does
not require validation for single instance database. There are many early adopter
customers of iSCSI running Oracle9i and Oracle Database 10g. As for RAC, Oracle has
chosen to validate the iSCSI technology (not each vendor's targets) for the 10g platforms
- this has been completed for Linux, Unix and Windows. For Windows we have tested up
to 4 nodes - Any Windows iSCSI products that are supported by the host and storage
device are supported by Oracle. We don't support NAS devices for Windows, however
some NAS devices (eg NetApp) can also present themselves as iSCSI devices. If this is
the case then a customer can use this iSCSI device with Windows as long as the iSCSI
device vendor supports Windows as an initiator OS. No vendor-specific information will
be posted on Certify.

What would you recomend to customer, Oracle clusterware or Vendor


Clusterware (I.E. MC Service Guard, HACMP, Sun Cluster, Veritas
etc.) with Oracle Database 10g Real Application Clusters?

You will be installing and using Oracle Clusterware whether or not you use the Vendor
Clusterware. The question you need to ask is whether the Vendor Clusterware gives you
something that Oracle Clusterware does not. Is the RAC database on the same server as
the application server? Are there any other processes on the same server as the database
that you require Vendor Clusterware to fail over to another server in the cluster if the
server it is running on fails? IF this is the case, you may want the vendor clusterware, if
not, why spend the extra money when Oracle Clusterware supplies everything you need
to for the clustered database included with your RAC license. Note: With Oracle
Database 10g Release 2, Oracle Clusterware can be used to manage application processes
in the cluster (start, stop, check, relocate)

When configuring the NIC cards and switch for a GigE Interconnect
should it be set to FULL or Half duplex in RAC?

You've got to use Full Duplex, regardless of RAC or not, but for all network
communication. Half Duplex means you can only either send OR receive at the same
time.
Is it a good idea to add anti-virus software to my RAC cluster?

For customers who choose to run anti-virus (AV) software on their database servers, they
should be aware that the nature of AV software is that disk IO bandwidth is reduced
slightly as most AV software checks disk writes/reads. Also, as the AV software runs, it
will use CPU cycles that would normally be consumed by other server processes (e.g
your database instance). As such, databases will have faster performance when not using
AV software. As some AV software is known to lock the files whilst is scans then it is a
good idea to exclude the Oracle Datafiles/controlfiles/logfiles from a regular AV scan

Can I use RAC in a distributed transaction processing environment?

YES. Best practices is to have all tightly coupled branches of a distributed transaction
running on a RAC database must run on the same instance. Between transactions and
between services, transactions can be load balanced across all of the database instances.
You can use services to manage DTP environments. By defining the DTP property of a
service, the service is guaranteed to run on one instance at a time in a RAC database. All
global distributed transactions performed through the DTP service are ensured to have
their tightly-coupled branches running on a single RAC instance.

How can a NAS storage vendor certify their storage solution for RAC ?

They should obtain an OCE test kit and complete the required RAC tests. They can
submit the request for an OCE kit to ocesup_ie@oracle.com.

The list of certified NAS vendors/solutions is posted on OTN under the OSCP program

Can I run 9i RAC and RAC 10g in the same cluster?

YES. However Oracle Clusterware (CRS) will not support a 9i RAC database so you will
have to leave the current configuration in place. You can install Oracle Clusterware and
RAC 10g into the same cluster. On Windows and Linux, you must run the 9i Cluster
Manager for the 9i Database and the Oracle Clusterware for the 10g Database. When you
install Oracle Clusterware, your 9i srvconfig file will be converted to the OCR. Both 9i
RAC and 10g will use the OCR. Do not restart the 9i gsd after you have installed Oracle
Clusterware. Remember to check certify for details of what vendor clusterware can be
run with Oracle Clusterware.

For example on Solaris, your 9i RAC will be using Sun Cluster. You can install Oracle
Clusterware and RAC 10g in the same cluster that is running Sun Cluster and 9i RAC.

Is Infiniband supported for the RAC interconnect?

Today IP over IB is supported, and RDS on Linux is supported with 10.2.0.3 forward.
Qlogic (formerly SilverStorm) is the supported RDS vendor. Watch certify for updates.
As other platforms adopt RDS, we will expand support. There are no plans to support
uDAPL or ITAPI protocols.

What combinations of Oracle Clusterware, RAC and ASM versions can I


use?

See Note: 337737.1 for detailed support matrix. Basically the Clusterware version must
be at least the highest release of ASM or RAC. ASM must be at least 10.1.0.3 to work
with 10.2 database.

What storage is supported with Standard Edition RAC?

As per the licensing documentation, you must use ASM for all database files with SE
RAC. There is no support for CFS or NFS.
From Oracle Database 10g Release 2 Licensing Doc:
Oracle Standard Edition and Real Application Clusters (RAC) When used with Oracle
Real Application Clusters in a clustered server environment, Oracle Database Standard
Edition requires the use of Oracle Clusterware. Third-party clusterware management
solutions are not supported. In addition, Automatic Storage Management (ASM) must be
used to manage all database-related files, including datafiles, online logs, archive logs,
control file, spfiles, and the flash recovery area. Third-party volume managers and file
systems are not supported for this purpose.
Should the SCSI-3 reservation bit be set for our Oracle Clusterware only
installation?

If you are using only Oracle Clusterware(no Veritas CM), then you don't need to have
SCSI-3 PGR enabled, since Oracle Clusterware does not require it for IO fencing. If the
reservation is set, then you'll get the inconsistent results. So ask your storage vendor to
disable the reservation. Veritas RAC requires that the storage array support SCSI-3 PGR,
since this is how Veritas handles IO fencing. This SCSI-3 PGR is set at the array level;
for example EMC hypervolume level.

What are the restrictions on the SID with a RAC database? Is it limited to
5 characters?

The SID prefix in 10g Release 1 and prior versions was restricted to five characters by
install/config tools so that an ORACLE_SID of upto max of 5+3=8 characters can be
supported in a RAC environment. The SID prefix is relaxed upto 8 characters in 10g
Release 2, see bug4024251 for more information.

Does Oracle Clusterware or Real Application Clusters support


heterogeneous platforms?

Oracle Clusterware and Real Application Clusters do not support heterogeneous


platforms in the same cluster. Enterprise Manager Grid Control supports heterogeneous
platforms. We do support machines of different speeds and size in the same cluster. All
nodes must run the same operating system (I.E. they must be binary compatible). In an
active data-sharing environment, like RAC, we do not support machines having different
chip architectures.

Why is validateUserEquiv failing during install (or cluvfy run)?

SSH must be set up as per the pre-installation tasks. It is also necessary to have file
permissions set as described below for features such as Public Key Authorization to
work. If your permissions are not correct, public key authentication will fail, and will
fallback to password authentication with no helpful message as to why. The following
server configuration files and/or directories must be owned by the account owner or by
root and GROUP and WORLD WRITE permission must be disabled.

$HOME
$HOME/.rhosts
$HOME/.shosts
$HOME/.ssh
$HOME/.ssh.authorized-keys
$HOME/.ssh/authorized-keys2 #Openssh specific for ssh2 protocol.

SSH (from OUI) will also fail if you have not connected to each machine in your cluster
as per the note in the installation guide:

The first time you use SSH to connect to a node from a particular system, you may see a
message similar to the following:

The authenticity of host 'node1 (140.87.152.153)' can't be established. RSA key


fingerprint is 7z:ez:e7:f6:f4:f2:4f:8f:9z:79:85:62:20:90:92:z9.
Are you sure you want to continue connecting (yes/no)?

Enter |yes| at the prompt to continue. You should not see this message again when you
connect from this system to that node. Answering yes to this question causes an entry to
be added to a "known-hosts" file in the .ssh directory which is why subsequent
connection requests do not re-ask.
This is known to work on Solaris and Linux but may work on other platforms as well.

I had a 3 node RAC. One of the nodes had to be completely rebuilt as a


result of a problem. As there are no backups, What is the proper
procedure to remove the 3rd node from the cluster so it can be
added back in?

Follow the documentation for removing a node but you can skip all the steps in the node-
removal doc that need to be run on the node being removed, like steps 4, 6 and 7 (See
Chapter 10 of RAC Admin and Deployment Guide). Make sure that you remove any
database instances that were configured on the failed node with srvctl, and listener
resources also, otherwise rootdeltenode.sh will have trouble removing the nodeapps.
Just running rootdeletenode.sh isn't really enough, because you need to update the
installer inventory as well, otherwise you won't be able to add back the node using
addNode.sh. And if you don't remove the instances and listeners you'll also have
problems adding the node and instance back again.
Probably a better alternative (than the generic documentation, bug 5929611 filed) for a
remove node is Note 269320.1

A client is a new RAC user and are using it in conjunction with BEA
weblogic. Can they use Connection Load Balancing and Services?
What about FCF, FAN, RCLB?

The key item here is whether or not they are using XA. If they are using XA (Tuxedo for
example), then they should use the DTP service with 10g Release 2. Have the customer
review the Best Practices for using XA with RAC on OTN .
If it is not XA then services and Net Service Connection Load Balancing should work
fine. They can tune aspects of the recovery such as instance recovery time. Using BEA,
they do not get the advanced features such as Fast Connection Failover (FCF) and
Runtime Connection Load Balancing . To understand services, FCF, RCLB, read the
RAC Admin and Deployment Guide for 10g Release 2 Chapter 6.

Is relink required for CRS_HOME after OS upgrade?

Oracle Clusterware binaries cannot be relinked on shiphomes. So to answer your question


no, there is not need to relink Oracle Clusterware binaries.

How many NICs do I need to implement RAC?

At minimum you need 2: external (public), interconnect (private). When storage for RAC
is provided by Ethernet based networks (e.g. NAS/nfs or iSCSI), you will need a third
interface for I/O so a minimum of 3. Anything else will cause performance and stability
problems under load. From an HA perspective, you want these to be redundant, thus
needing a total of 6.

Can we output the backupset onto regular file system directly (not onto
flash recovery area) using RMAN command, when we use SE RAC?
Yes, - customers might want to backup their database to offline storage so this is also
supported.

Does changing uid or gid of the Oracle User affect Oracle Clusterware?

There are a lot of files in the Oracle Clusterware home and outside of the Oracle
Clusterware home that are chgrp'ed to the appropriate groups for security and appropriate
access. The filesystem records the uid (not the username), and so if you exchange the
names, now the files are owned by the wrong group.

I could not get the user equivalence check to work on my Solaris 10 server
when trying to install 10.2.0.1 Oracle Clusterware. The install ran
fine without issue. << Message: Result: User equivalence check
failed for user "oracle". >>

Cluvfy tries to find Ssh on solaris at /usr/local/bin. Workaround is to create a softlink


from /usr/bin/ssh to /usr/local/bin. To resolve issues with cluvfy, it often helps to turn on
tracing:
run cluvfy with -verbose attribute and SRVM_TRACE environment variable set to
TRUE
$script run.log
$export SRVM_TRACE=TRUE
$cluvfy -blah ### Add the -verbose
$exit

Can my customer use Veritas Agents to manage their RAC database on


Unix with SFRAC installed?

For details on the support of SFRAC and Veritas Agents with RAC 10g, please see
Metalink Note 397460.1 and Metalink Note 332257.1

Can I run more than one clustered database on a single RAC cluster?
You can run multiple databases in a RAC cluster, either one instance per node (w/
different databases having different subsets of nodes in a cluster), or multiple instances
per node (all databases running across all nodes) or some combination in between.
Running multiple instances per node does cause memory and resource fragmentation, but
this is no different from running multiple instances on a single node in a single instance
environment which is quite common. It does provide the flexibility of being able to share
CPU on the node, but the Oracle Resource Manager will not currently limit resources
between multiple instances on one node. You will need to use an OS level resource
manager to do this.

Where can I find a list of supported solutions to ensure NIC availability


(for the interconnect) per platform?

IBM AIX - available solutions:

• Etherchannel (OS based)


• HACMP based network failover solution

More information: Metalink Note 296856.1

HP HP/UX - available solutions:

• APA - Auto Port Aggregation (OS based)


• MC/Serviceguard based network failover solution
• Combination of both solutions

More information: Metalink Note 296874.1 and Auto Port Aggregation (APA)
Support Guide

Sun Solaris - available solutions:

• Sun Trunking (OS based)


• Sun IPMP (OS based)
• Sun Cluster based network failover solution (IPMP based)

More information: Metalink Note 283107.1 - IPMP in general. When IPMP is


used for the interconnect: Metalink Note 368464.1
Related RAC FAQ entries: In Solaris 10, do we need Sun Clusterware to provide
redundancy for the interconnect and multiple switches?
Linux - available solutions:

• Bonding

More information: Metalink Note 298891.1


Related RAC FAQ entries: How do I use multiple network interfaces to provide
High Availability and/or Load Balancing for my interconnect with Oracle
Clusterware?

Windows - available solutions:

• Teaming

On Windows teaming solutions used to ensure NIC availability are usually part of
the network card driver.
Thus, they depend on the network card used. Please, contact te respective
hardware vendor for more information.

Is there a need to renice LMS processes in Oracle RAC 10g Release 2?

LMS processes should be running in RT by default since 10.2, so there's NO need to


renice them, or otherwise mess with them.
Check with ps -efl:
0 S spommere 31191 1 0 75 0 - 270857 - 10:01 ? 00:00:00 ora_lmon_appsu01
0 S spommere 31193 1 5 75 0 - 271403 - 10:01 ? 00:00:07 ora_lmd0_appsu01
0 S spommere 31195 1 0 58 - - 271396 - 10:01 ? 00:00:00 ora_lms0_appsu01
0 S spommere 31199 1 0 58 - - 271396 - 10:01 ? 00:00:00 ora_lms1_appsu01

7th column, if it is 75 or 76 then this is Time Share, 58 is Real Time.

You can also use chrt to check:


LMS (Real Time):
$ chrt -p 31199
pid 31199's current scheduling policy: SCHED_RR
pid 31199's current scheduling priority: 1

LMD (Time Share)

$ chrt -p 31193
pid 31193's current scheduling policy: SCHED_OTHER
pid 31193's current scheduling priority: 0
How do I check for network problems on my interconect?

1. Confirm that full duplex is set correctly for all interconnect links on all interfaces on
both ends. Do not rely on auto negotiation.
2. ifconfig -a will give you an indication of collisions/errors/overuns and dropped packets
3. netstat -s will give you a listing of receive packet discards, fragmentation and
reassembly errors for IP and UDP.
4. Set the udp buffers correctly
5. Sheck your cabling
Note: If you are seeing issues with RAC, RAC uses UDP as the protocol. Oracle
Clusterware uses TCP/IP.

Are there any issues for the interconnect when sharing the same switch as
the public network by using VLAN to separate the network?

RAC and Clusterware deployment best practices suggests that the interconnect be
deployed on a stand-alone, physically seperate, dedicated switch. Many customers have
consolidated these stand-alone switches into larger managed switches. A consequence of
this consolidation is a merging of IP networks on a single shared switch, segmented by
VLANs. There are caveats associated with such deployments. RAC cache fusion
exercises the IP network more rigorously than non-RAC Oracle databases. The latency
and bandwidth requirements as well as availability requirements of the RAC/Clusterware
interconnect IP network are more in-line with high performance computing. Deploying
the RAC/Clusterware interconnect on a shared switch, segmented VLAN may expose the
interconnect links to congestion and instability in the larger IP network topology. If
deploying the interconnect on a VLAN, there should be a 1:1 mapping of VLAN to non-
routable subnet and the VLAN should not span multiple VLANs (tagged) or multiple
switches. Deployment concerns in this environment include Spanning Tree loops when
the larger IP network topology changes, Assymetric routing that may cause packet
flooding, and lack of fine grained monitoring of the VLAN/port.

How do I use DBCA in silent mode to set up RAC and ASM?

If I already have an ASM instance/diskgroup then the following creates a RAC database
on that diskgroup:
su oracle -c "$ORACLE_HOME/bin/dbca -silent -createDatabase -templateName
General_Purpose.dbc -gdbName $SID -sid $SID -sysPassword $PASSWORD
-systemPassword $PASSWORD -sysmanPassword $PASSWORD -dbsnmpPassword
$PASSWORD -emConfiguration LOCAL -storageType ASM -diskGroupName
$ASMGROUPNAME -datafileJarLocation $ORACLE_HOME/assistants/dbca/templates
-nodeinfo $NODE1,$NODE2 -characterset WE8ISO8859P1 -obfuscatedPasswords false
-sampleSchema false -oratabLocation /etc/oratab"

The following will create a ASM instance & 1 diskgroup

su oracle -c "$ORA_ASM_HOME/bin/dbca -silent -configureASM -gdbName NO -sid


NO -emConfiguration NONE -diskList $ASM_DISKS -diskGroupName
$ASMGROUPNAME -datafileJarLocation $ORACLE_HOME/assistants/dbca/templates
-nodeinfo $NODE1,$NODE2 -obfuscatedPasswords false -oratabLocation /etc/oratab
-asmSysPassword $PASSWORD -redundancy $ASMREDUNDANCY"

where ASM_DISKS = '/dev/sda1,/dev/sdb1' and ASMREDUNDANCY='NORMAL'

My customer has an XA Application with a RAC Database, can I do Load


Balancing across the RAC instances?

No, not in the traditional Oracle Net Services Load Balancing. We have written a
document that explains the ** best practices for 9i, 10g Release 1 and 10g Release 2**
. With the 10g Services, life gets easier. To understand services, read the RAC Admin
and Deployment Guide for 10g Release 2 Chapter 6.

How does OCR mirror work? What happens if my OCR is lost/corrupt?

OCR is the Oracle Cluster Registry, it holds all the cluster related information such as
instances, services. The OCR file format is binary and starting with 10.2 it is possible to
mirror it. Location of file(s) is located in: /etc/oracle/ocr.loc in ocrconfig_loc and
ocrmirrorconfig_loc variables.

Obviously if you only have one copy of the OCR and it is lost or corrupt then you must
restore a recent backup, see ocrconfig utility for details, specifically -showbackup and
-restore flags. Until a valid backup is restored the Oracle Clusterware will not startup due
to the corrupt/missing OCR file.
The interesting discussion is what happens if you have the OCR mirrored and one of the
copies gets corrupt? You would expect that everything will continue to work seemlessly.
Well.. Almost.. The real answer depends on when the corruption takes place.

If the corruption happens while the Oracle Clusterware stack is up and running, then the
corruption will be tolerated and the Oracle Clusterware will continue to funtion without
interruptions. Despite the corrupt copy. DBA is advised to repair this hardware/software
problem that prevent OCR from accessing the device as soon as possible; alternatively,
DBA can replace the failed device with another healthy device using the ocrconfig utility
with -replace flag.

If however the corruption happens while the Oracle Clusterware stack is down, then it
will not be possible to start it up until the failed device becomes online again or some
administrative action using ocrconfig utility with -overwrite flag is taken. When the
Clusteware attempts to start you will see messages similar to:

total id sets (1), 1st set (1669906634,1958222370), 2nd set (0,0) my


votes (1), total votes (2)
2006-07-12 10:53:54.301: [OCRRAW][1210108256]proprioini:disk 0
(/dev/raw/raw1) doesn't have enough votes (1,2)
2006-07-12 10:53:54.301: [OCRRAW][1210108256]proprseterror: Error in
accessing physical storage [26]
This is because the software can't determin which OCR copy is the valid one. In the
above example one of the OCR mirrors was lost while the Oracle Clusterware was down.
There are 3 ways to fix this failure:

a) Fix whatever problem (hardware/software?) that prevent OCR from accessing the
device.

b) Issue "ocrconfig -overwrite" on any one of the nodes in the cluster. This command will
overwrite the vote check built into OCR when it starts up. Basically, if OCR device is
configured with mirror, OCR assign each device with one vote. The rule is to have more
than 50% of total vote (quorum) in order to safely make sure the available devices
contain the latest data. In 2-way mirroring, the total vote count is 2 so it requires 2 votes
to achieve the quorum. In the example above there isn't enough vote to start if only one
device with one vote is available. (In the earlier example, while OCR is running when the
device is down, OCR assign 2 vote to the surviving device and that is why this surviving
device now with two votes can start after the cluster is down). See warning below

c) This method is not recommend to be performed by customers. It is possible to


manually modify ocr.loc to delete the failed device and restart the cluster. OCR won't do
the vote check if the mirror is not configured. See warning below

EXTREME CAUTION should be excersized if chosing option b or c above since data


loss can occur if the wrong file is manipulated, please contact Oracle Support for
assistance before proceeding.
Bug 5055145 was the basis for this FAQ, also thanks to Ken Lee for his valuable
feedback.

Why do we have a Virtual IP (VIP) in 10g? Why does it just return a dead
connection when its primary node fails?

Its all about availability of the application.


When a node fails, the VIP associated with it is automatically failed over to some other
node. When this occurs, the following things happen.
(1) VIP detects public network failure which generates a FAN event.
(2) the new node re-arps the world indicating a new MAC address for the IP.
(3) Old clients subscribing to FAN immediately receive ORA-3113 error or equivalent.
Those not subscribing to FAN will eventually time out.
(4) New connection requests rapidly traverse the tnsnames.ora address list skipping over
the dead nodes, instead of having to wait on TCP-IP timeouts
Without using VIPs or FAN, clients connected to a node that died will often wait a 10
minute TCP timeout period before getting an error.
As a result, you don't really have a good HA solution without using VIPs and FAN.

If I use Services with Oracle Database 10g, do I still need to set up Load
Balancing ?

Yes, Services allow you granular definition of workload and the DBA can dynamically
define which instances provide the service. Connection Load Balancing still needs to be
set up to allow the user connections to be balanced across all instances providing a
service.

In Solaris 10, do we need Sun Clusterware to provide redundancy for the


interconnect and multiple switches?

Link Aggregation (GLDv3) is bundled in the OS as of Solaris 10. IPMP is available for
Solaris 10 and Solaris 9. Neither require Sun Cluster to be installed. For the interconnect
and switch redundancy, as a best practice, avoid VLAN trunking across the switches. For
ease of configuration (e.g. fewer IP address requirements), use IPMP with link mode
failure detection in primary/standby configuration. This will give you a single failover IP
which you will define in cluster_interconnects init.ora parameter. Remove any interfaces
for the interconnect from the OCR using `oifcfg delif`. AND TEST THIS
RIGOROUSLY. For now, as Link Aggregation (GLDv3) cannot span multiple switches
from a single host, you will need to configure the switch redundancy and the host NICs
with IPMP. When configuring IPMP for the interconnect with multiple switches
available, configure IPMP as active/standby and *not* active/active. This is to avoid
potential latencies in switch failure detection/failover which may impact the availability
of the rdbms. Note, IPMP spreads/load balances outbound packets on the bonded
interfaces, but inbound packets are received on a single interface. In an active/active
configuration this makes send/receive problems difficult to diagnose. Both Link
Aggregation (GLDv3) and IPMP are core OS packages SUNWcsu, SUNWcsr
respectively and do not require Sun Clusterware.

Can RMAN backup Real Application Cluster databases?

Absolutely. RMAN can be configured to connect to all nodes within the cluster to
parallelize the backup of the database files and archive logs. If files need to be restored,
using set AUTOLOCATE ON alerts RMAN to search for backed up files and archive
logs on all nodes.

RAC with RMAN in the Oracle Documentation

I am receiving an ORA-29740 error. What should I do?

This error can occur when problems are detected on the cluster:

Error: ORA-29740 (ORA-29740)


Text: evicted by member %s, group incarnation %s
---------------------------------------------------------------------------
Cause: This member was evicted from the group by another member of the
cluster database for one of several reasons, which may include a
communications error in the cluster, failure to issue a heartbeat
to the control file, etc.
Action: Check the trace files of other active instances in the cluster
group for indications of errors that caused a reconfiguration.

For more information on troubleshooting this error, see the following Metalink note:

Note 219361.1
Troubleshooting ORA-29740 in a RAC Environment
What does the Virtual IP service do? I understand it is for failover but do
we need a separate network card? Can we use the existing
private/public cards? What would happen if we used the public ip?

The 10g Virtual IP Address (VIP) exists on every RAC node for public network
communication. All client communication should use the VIPs in their TNS connection
descriptions. The TNS ADDRESS_LIST entry should direct clienst to VIPs rather than
using hostnames. During normal runtime, the behaviour is the same as hostnames,
however when the node goes down or is shutdown the VIP is hosted elsewhere on the
cluster, and does not accept connection requests. This results in a silent TCP/IP error and
the client fails immediately to the next TNS address. If the network interface fails within
the node, the VIP can be configured to use alternate interfaces in the same node. The VIP
must use the public interface cards. There is no requirement to purchase additional public
interface cards (unless you want to take advantage of within-node card failover.)

What do the VIP resources do once they detect a node has failed/gone
down? Are the VIPs automatically acquired, and published, or is
manual intervention required? Are VIPs mandatory?

When a node fails, the VIP associated with the failed node is automatically failed over to
one of the other nodes in the cluster. When this occurs, two things happen:

1. The new node re-arps the world indicating a new MAC address for this IP
address. For directly connected clients, this usually causes them to see errors on
their connections to the old address;
2. Subsequent packets sent to the VIP go to the new node, which will send error
RST packets back to the clients. This results in the clients getting errors
immediately.

In the case of existing SQL conenctions, errors will typically be in the form of ORA-3113
errors, while a new connection using an address list will select the next entry in the list.
Without using VIPs, clients connected to a node that died will often wait for a TCP/IP
timeout period before getting an error. This can be as long as 10 minutes or more. As a
result, you don't really have a good HA solution without using VIPs.
What are my options for load balancing with RAC? Why do I get an
uneven number of connections on my instances?

All the types of load balancing available currently (9i-10g) occur at connect time.
This means that it is very important how one balances connections and what these
connections do on a long term basis.
Since establishing connections can be very expensive for your application, it is good
programming practice to connect once and stay connected. This means one needs to be
careful as to what option one uses. Oracle Net Services provides load balancing or you
can use external methods such as hardware based or clusterware solutions.
The following options exist prior to Oracle RAC 10g Releae 2 (for 10g Release 2 see
Load Balancing Advisory):
Random
Either client side load balancing or hardware based methods will randomize the
connections to the instances.
On the negative side this method is unaware of load on the connections or even if they
are up meaning they might cause waits on TCP/IP timeouts.
Load Based
Server side load balancing (by the listener) redirects connections by default depending on
the RunQ length of each of the instances. This is great for short lived connections.
Terrible for persistent connections or login storms. Do not use this method for
connections from connection pools or applicaton servers
Session Based
Server side load balancing can also be used to balance the number of connections to each
instance. Session count balancing is method used when you set a listener parameter,
prefer_least_loaded_node_listener-name=off. Note listener name is the actual name of
the listener which is different on each node in your cluster and by default is
listener_nodename.
Session based load balancing takes into account the number of sessions connected to each
node and then distributes the connections to balance the number of sessions across the
different nodes.

Can our 10g VIP fail over from NIC to NIC as well as from node to node ?

Yes the 10g VIP implementation is capable from failing over within a node from NIC to
NIC and back if the failed NIC is back online again, and also we fail over between nodes.
The NIC to NIC failover is fully redundant if redundant switches are installed.

What is CLB_GOAL and how should I set it?


CLB_GOAL is the connection load balancing goal for a service. There are 2 options,
CLB_GOAL_SHORT and CLB_GOAL_LONG (default).
Long is for applications that have long-lived connections. This is typical for connection
pools and SQL*Forms sessions. Long is the default connection load balancing goal.
Short is for applications that have short-lived connections.
The GOAL for a service can be set with EM or DBMS_SERVICE.
Note: You must still configure load balancing with Oracle Net Services

What is Server-side Transparent Application Failover (TAF) and how do I


use it?

Oracle Database 10g Release 2, introduces server-side TAF when using services. After
you create a service, you can use the dbms_service.modify_service pl/sql procedure to
define the TAF policy for the service. Only the basic method is supported. Note this is
different than the TAF policy (traditional client TAF) that is supported by srvctl and EM
Services page. If your service has a server side TAF policy defined, then you do not have
to encode TAF on the client connection string. If the instance where a client is connected,
fails, then the connection will be failed over to another instance in the cluster that is
supporting the service. All restrictions of TAF still apply.
NOTE: both the client and server must be 10.2 and aq_ha_notifications must be set to
true for the service.
Sample code to modify service:

execute dbms_service.modify_service (service_name => 'gl.us.oracle.com' -


, aq_ha_notifications => true -
, failover_method => dbms_service.failover_method_basic -
, failover_type => dbms_service.failover_type_select -
, failover_retries => 180 -
, failover_delay => 5 -
, clb_goal => dbms_service.clb_goal_long);

With three primary load balancing options (client-side connect-time LB,


server-side connect-time LB, and the runtime connection load
balancing) Is it fair to say Runtime Connection Load Balancing is
the only option to leverage FAN up/down events?

No. The listener is a subscriber to all FAN events (both from the load balancing advisory
and the HA events). Therefore server side connection load balancing leverages FAN HA
events as well as laod balancing advisory events.
With the Oracle JDBC driver 10g Release 2, if you enable Fast Connection Failover, you
also enable Runtime Connection Load Balancing (one knob for both).

How can a customer mask the change in their clustered database


configuration from their client or application? (I.E. So I do not have
to change the connection string when I add a node to the RAC
database)

The combination of Server Side load balancing and Services allows you to easily mask
cluster database configuration changes. As long as all instances register with all listeners
(use the LOCAL_LISTENER and REMOTE_LISTENER parameters), server side load
balancing will allow clients to connect to the service on currently available instances at
connect time.
The load balancing advisory (setting a goal on the service) will give advice as to how
many connections to send to each instance currently providing a service. When a service
is enabled on an instance, as long as the instance registers with the listeners, the clients
can start getting connections to the service and the load balancing advisory will include
that instance is its advice.

After executing DBMS_SERVICE.START_SERVICE, the service


resource remains OFFLINE status when confirming it with crs_stat.
Is that expected behavior ?

YES this is expected behaviour. Unfortunately, the


DBMS_SERVICE.START_SERVICE does not update the clusterware. You should use
srvctl start service -d dbname then you should see it come online.

Is it possible to use SVRCTL start database with a user account other than
oracle ( that is other than the owner of the oracle software)?

YES. When you create a RAC db as a user different than the home/software owner
(oracle) user, the db creation assistant would set the correct permissions/ACLs on the
CRS resources that control the db/instances etc, assuming that you had setup group
membership for this user to the dba group of the home (find it using
oracle_home/bin/osdbagrp) and also part of the crs home owners primary group (usually
oinstall) and there was group write permission on the oracle_home.
The client gets this error message in Production in the ons.log file every
minute or so: 06/11/10 10:11:14 [2] Connection 0,129.86.186.58,6200
SSL handshake failed 06/11/10 10:11:14 [2] Handshake for
0,129.86.186.58,6200: nz error = 29049 interval = 0 (180 max)

These annoying messages in ons.log are telling you that you have a configuration
mismatch for ONS somewhere in the farm. RAC has its own ONS server for which SSL
is disabled by default. You must either enable SSL for RAC ONS, or disable it for OID
ONS(OPMN). You need to create a wallet for each RAC ONS server, or copy one of the
wallets from OPMN on the OID instances.
In ons.conf you need to specify the wallet file and password:
walletfile=
walletpassword=
ONS only uses SSL between servers, and so ONS clients will not be affected. You
specify the wallet password when you create the wallet. If you copy a wallet from an
OPMN instance, then use the same password configured in opmn.xml. If there is no
wallet password configured in opmn.xml, then you don't need to specify a wallet
password in ons.conf either.

How do I configure FCF with BPEL so I can use RAC 10g in the backend?

** Note:372456.1 describes the procedure to set up BPEL with a Oracle RAC 10g
Release 1 database.
If you are using SSL, ensure the SSL enable attribute of ONS in opmn.xml file has same
value, either true or false, for all OPMN servers in the Farm. To troubleshoot OPMN at
the application server level, look at appendix A in Oracle� Process Manager and
Notification Server Administrator's Guide.

I am using shared services which the following set in init.ora SQL> show
parameters dispatchers=(protocol=TCP)(listener=listen ers_nl01)
(con=500)(serv=oltp). I stopped my service with srvctl stop service
but it is still registered with the listener and accepting connections.
Is this expected?
YES. This is by design of dispatchers which are part of Oracle Net Services. If you
specify the service attribute of the dispatchers init.ora parameter, the service specified
cannot be managed by the dba.

What clients provide integration with FAN through FCF?

With Oracle Database 10g Release 1, JDBC clients (both thick and thin driver) are
integrated with FAN by providing FCF. With Oracle Database 10g Release 2, we have
added ODP.NET and OCI. Other applications can integrate with FAN by using the API
to subscribe to the FAN events.
Note: If you are using a 3rd party application server, then you can only use FCF if you
use the Oracle driver and except for OCI, its connection pool. If you are using the
connection pool of the 3rd Party Application Server, then you do not get FCF. Your
customer can subscribe directly to FAN events however that is a development project for
the customer. See the white paper Workload Management with Oracle RAC 10g on OTN

Can I use TAF and FAN/FCF?

With Oracle Database 10g Release 1, NO. With Oracle Database 10g Release 2, the
answer is YES for OCI and ODP.NET, it is recommended. For JDBC, you should not use
TAF and FCF even with the Thick JDBC driver.

How does the datasource properties initialLimit, minLimit, and maxLimit


affect Fast Connection Failover processing with JDBC?

The initialLimit property on the Implicit Connection Cache is effective only when the
cache is first created. For example, if the initialLimit is set to 10, you'll have 10
connections pre-created and available when the conn cache is first created. Pls don't be
confused between minLimit and initialLimit. The current behavior is that after a DOWN
event and the affected connections are cleaned up, it is possible for the number of
connections in the cache to be lower than minLimit.

An UP event is processed for both (a) new instance joins, as well as (b) down followed
by an instance UP. This has no relevance to initialLimit, or even minLimit. When a UP
event comes into our jdbc Implicit Connection Cache, we will create some new
connections. Assuming you have your listener load balancing set up properly, then those
connections should go to the instance that was just started. When your application does a
get connection to the pool, it will be given an idle connection, if you are running 10.2 and
have the load balancing advisory turned on for the service, we will allocate the session
based on the defined goal to provide the best service level

MaxLimit, when set, defines the upper boundary limit for the connection cache. By
default, maxLimit is unbounded - your database sets the limit.

Do I need to install the ONS on all my mid-tier serves in order to enable


JDBC Fast Connection Failover (FCF)?

With 10g Release 1, the middle tier must have ONS running (started by same users as
application). ONS is not included on the Client CD however is is part of the Oracle
Database 10g cd.
With 10g Release 2, they do not need to install the ons on the middle tier. The JDBC
driver allows the use of remote ONS (ie uses the ONS running in the RAC cluster) . Just
use the datasource parameter
ods.setONSConfiguration("nodes=racnode1:4200,racnode2.:4200");

Will FAN/OCI work with Instant Client?

Yes, FAN/OCI will work with Instant Client. Both client and server must be Oracle
Database 10g Release 2.

What type of callbacks are supported with OCI when using FAN/FCF?

There are two separate callbacks supported. The HA Events (FAN) callback is called
when an event occurs. When a down event occurs, for example, you can clean up a
custom connection pool. i.e. purge stale connections. When the failover occurs, the TAF
callback is invoked. At failover time you can customize the newly created database
session. Both FAN and TAF are client-side callbacks. FAN also has a separate server side
callout that should not be confused with the OCI client callback.
Does FCF for OCI react to FAN HA UP events?

OCI does not perform any implicit actions on an up event, however if a HA event
callback is present, it is invoked. You can take any required action at that time.

Can I use FAN/OCI with Pro*C?

Since Pro*C (sqllib) is built on top of OCI, it should support HA events. You need to
precompile the application with the option EVENTS=TRUE, make sure you link the
application with a thread library. The database connection must use a Service that has
been enabled for AQ events. Use dbms_service.modify_service to enable the service for
events (aq_ha_notifications => true) or use the EM Cluster Database Services page.

Do I have to link my OCI application with a thread library? Why?

YES, you must link the application to a threads library. This is required because the AQ
notifications occur asynchronously, over an implicitly spawned thread.

Can I use the 10.2 JDBC driver with 10.1 database for FCF?

Yes with the patch for Bug 5657975 for 10.2.0.3,the 10.2 JDBC driver will work with a
10.1 database. The fix will be part of the 10.2.0.4 patchset. If you do not have the patch
then using FCF, use the 10.2 JDBC driver with 10.2 database. If database is 10.1, use
10.1 JDBC driver.

I am seeing the wait events 'ges remote message', 'gcs remote message',
and/or 'gcs for action'. What should I do about these?

These are idle wait events and can be safetly ignored. The 'ges remote message' might
show up in a 9.0.1 statspack report as one of the top wait events. To have this wait event
not show up you can add this event to the PERFSTAT.STATS$IDLE_EVENT table so
that it is not listed in Statspack reports.
What are the changes in memory requirements from moving from single
instance to RAC?

If you are keeping the workload requirements per instance the same, then about 10%
more buffer cache and 15% more shared pool is needed. The additional memory
requirement is due to data structures for coherency management. The values are heuristic
and are mostly upper bounds. Actual esource usage can be monitored by querying
current and maximum columns for the gcs resource/locks and ges resource/locks entries
in V$RESOURCE_LIMIT.

But in general, please take into consideration that memory requirements per instance are
reduced when the same user population is distributed over multiple nodes. In this case:

Assuming the same user population N number of nodes M buffer cache for a single
system then

(M / N) + ((M / N )*0.10) [ + extra memory to compensate for failed-over users ]

Thus for example with a M=2G & N=2 & no extra memory for failed-over users

=( 2G / 2 ) + (( 2G / 2 )) *0.10

=1G + 100M

What is the Load Balancing Advisory?

To assist in the balancing of application workload across designated resources, Oracle


Database 10g Release 2 provides the Load Balancing Advisory. This Advisory monitors
the current workload activity across the cluster and for each instance where a service is
active; it provides a percentage value of how much of the total workload should be sent to
this instance as well as service quality flag. The feedback is provided as an entry in the
Automatic Workload Repository and a FAN event is published.

What is Runtime Connection Load Balancing?


Runtime connection load balancing enables the connection pool to route incoming work
requests to the available database connection that will provide it with the best service.
This will provide the best service times globally, and routing responds fast to changing
conditions in the system. Oracle has implemented runtime connection load balancing
with ODP.NET and JDBC connection pools. Runtime Connection Load Balancing is
tightly integrated with the automatic workload balancing features introduced with Oracle
Database 10g I.E. Services, Automatic Workload Repository, and the new Load
Balancing Advisory.

How do I enable the load balancing advisory?

The load balancing advisory requires the use of services and Oracle Net connection load
balancing.
To enable it, on the server: set a goal (service_time or throughput, for ODP.NET enable
AQ_HA_NOTIFICATIONS=>true, and set CLB_GOAL ) on your service.
For client, you must be using the connection pool.
For JDBC, enable the datasource parameter FastConnectionFailoverEnabled.
For ODP.NET enable the datasource parameter Load Balancing=true.

What are my options for setting the Load Balancing Advisory GOAL on a
Service?

The load balancing advisory is enabled by setting the GOAL on your service either
through PL/SQL DBMS_SERVICE package or EM DBControl Clustered Database
Services page. There are 3 options for GOAL:
None � Default setting, turn off advisory
THROUGHPUT � Work requests are directed based on throughput. This should be
used when the work in a service completes at homogenous rates. An example is a trading
system where work requests are similar lengths.
SERVICE_TIME � Work requests are directed based on response time. This should be
used when the work in a service completes at various rates. An example is as internet
shopping system where work requests are various lengths

How do I measure the bandwidth utilization of my NIC or my


interconnect?
One simple/quick and not very recommended way is to look at output of "ifconfig eth0"
and compare values of "RX bytes" and "TX bytes" over time this will show _average_
usage per period of time.

A more reliable, interactive way on Linux is to use the iptraf utility or the prebuilt rpms
from redhat or Novell (SuSE), another option on Linux is Netperf . On other Unix
platforms: "snoop -S -tr -s 64 -d hme0", AIX's topaz can show that as well.. Try to look
for the peak (not average) usage and see if that is acceptably fast.

Remember that NIC bandwidth is measured in Mbps or Gbps (which is BITS per second)
and output from above utilities can sometimes come in BYTES per second, so for
comparison, do proper conversion (divide bps value by 8 to get bytes/sec; or, multiple
bytes value by 8 to get bps value).

Additionally, you can't expect a network device to run at full capacity with 100%
efficiency, due to concurrency, collisions and retransmits that happens more frequently as
the utilization gets higher. If you are reaching high levels consider a faster interconnect or
NIC bonding (multiple NICs all servicing the same IP address).

Finally, above is measuring bandwidth utilization (how much), not latency (how fast) of
the interconnect, you may still be suffering from high latency connection (slow link) even
though there is plenty of bandwidth to spare. Most experts agree that low latency is by far
more important than a high bandwidth with respect to specifications of the private
interconnect in RAC. Latency is best measured by the actual user of the network link
(RAC in this case), review statspack for stats on latency. Also, in 10gR2 Grid Control
you can view Global Cache Block Access Latency, you can also drill down to the Cluster
Cache Coherency page to see the cluster cache coherency metrics for the entire cluster
database.

Keep in mind that RAC is using the private interconnect like it was never used before, to
synchronize memory regions (SGAs) of multiple nodes (remember, since 9i, entire data
blocks are shipped accross the interconnect), if the network is utilized at 50% bandwidth,
this means that 50% of the time it is busy and not available to potential users. In this case
delays (due to collisions and concurrency) will increase the latency even though the
bandwidth might look "reasonable", it's hiding the real issue.

Does Database blocksize or tablespace blocksize affect how the data is


passed across the interconnect?

Oracle ships database block buffers, i.e. blocks in a tablespace configured for 16K will
result in a 16K data buffer shipped, blocks residing in a tablespace with base block size
(8K) will be shipped as base blocks and so on; the data buffers are broken down to
packets of MTU sizes.
How can I validate the scalability of my shared storage? (Tightly related to
RAC / Application scalability)

Storage vendors tend to focus their sales pitch mainly on the storage unit's capacity in
Terabytes (1000 GB) or Petabytes (1000 TB), however for RAC scalability it's critical to
also look at the storage unit's ability to process I/O's per second (throughput) in a scalable
fashion, specifically from multiple sources (nodes). If that criteria is not met, RAC /
Application scalability most probably will suffer, as it partially depends on storage
scalability as well as a solid and capable interconnect (for network traffice between nr>
Storage vendors may sometimes discourage such testing, boasting about their amazing
front or backend battery backed memory caches that "eliminate" all I/O bottlenecks. This
is all great, and you should take advantage of such caches as much as possible... however,
there is no substitute to a a real world test, you may uncover that the HBA (Host Buss
Adapater) firmware or the driver versions are outdated (before you claim poor RAC /
Application scalability issues).

It is highly recommended to test this storage scalability early on so that expectations are
set accordingly. On Linux there is a freely available tool released on OTN called ORION
(Oracle I/O test tool) which simulates Oracle I/O.

On other Unix platforms (as well as Linux) one can use IOzone, if prebuilt binary not
available you should build from source, make sure to use version 3.271 or later and if
testing raw/block devices add the "-I" flag.

In a basic read test you will try to demonstrate that a certain IO throughput can be
maintained as nodes are added. Try to simulate your database io patterns as much as
possible, i.e. blocksize, number of simultaneous readers, rates, etc.

For example, on a 4 node cluster, from node 1 you measure 20MB/sec, then you start a
read stream on node 2 and see another 20MB/sec while the first node shows no decrease.
You then run another stream on node 3 and get another 20MB/sec, in the end you run 4
streams on 4 nodes, and get an aggregated 80MB/sec or close to that. This will prove that
the shared storage is scalable. Obviously if you see poor scalability in this phase, that will
be carried over and be observed or interperted as poor RAC / Application scalability.

In many cases RAC / Application scalability is at blame for no real reason, that is, the
underlying IO subsystem is not scalable.

How many nodes are supported in a RAC Database?


With 10g Release 2, we support 100 nodes in a cluster using Oracle Clusterware, and 100
instances in a RAC database. Currently DBCA has a bug where it will not go beyond 63
instances. There is also a documentation bug for the max-instances parameter. With 10g
Release 1 the Maximum is 63. In 9i it is platform specific due to the different clusterware
support by vendors. See the platform specific FAQ for 9i.

I found in 10.2 that the EM "Convert to Cluster Database" wizard would


always fall over on the last step where it runs emca and needs to log
into the new cluster database as dbsnmp to create the cluster
database targets etc. I changed the password for the dbsnmp
account to be dbsnmp (same as username) and it worked OK. Is this
a known issue?

The conversion to cluster happens successfully but the EM monitoring credentials for the
converted database are not properly set due to this bug. This is resolved in next patchset.
In the interim, user can set the monitoring password from the "monitoring configuration"
screen for the RAC DB from GC console and proceed.
This issue has been fixed in 10.2.0.3 database and to get the complete functionality you
will need 10.2.0.2 Grid Control patch also, as the fix is spread between the two pieces of
software. For now you can proceed with setting password for dbsnmp user same as that
of sys user.

What storage option should I use for RAC 10g on Linux? ASM / OCFS /
Raw Devices / Block Devices / Ext3 ?

The recommended way to manage large amounts of storage in a RAC environment is


ASM (Automatic Storage Management). If you really need/want a clustered filesystem,
then Oracle offers OCFS (Oracle Clustered File System); for 2.4 kernel (RHEL3/SLES8)
use OCFS Version 1 and for 2.6 kernel (RHEL4/SLES9) use OCFS2. All these options
are free to use and completely supported, ASM is bundled in the RDBMS software, and
OCFS as well as ASMLib are freely downloadable from Oracle's OSS (Open Source
Software) website.

EXT3 is out of the question, since it's data structures are not cluster aware, that is, if you
mount an ext3 filesystem from multiple nodes, it will quickly get corrupted.

Another option of course is NFS and iSCSI both are outside the scope of this FAQ but
included for completeness.
If for any reason the above options (ASM/OCFS) are not good enough and you insist on
using 'raw devices' or 'block devices' here are the details on the two (This information is
still very useful to know in the context of ASM and OCFS).

On Unix/Linux there are two types of devices:

block devices (/dev/sde9) are **BUFFERED** devices!! unless you explicitly open
them in O_DIRECT you will get buffered (linux buffer cache) IO.

character devices (/dev/raw/raw9) are *UN-BUFFERRED** devices!! no matter how


you open them, you always get unbufferred IO, hence no need to specify O_DIRECT on
the file open call.

Above is not a typo, block devices on Unix do buffered IO by default (cached in linux
buffer cache), this means that RAC can not operate on it (unless opened with
O_DIRECT), since the IO's will not be immediately visible to other nodes.

You may check if a device is block or character device by the first letter printed with the
"ls -l" command:
crw-rw---- 1 root disk 162, 1 Jan 23 19:53 /dev/raw/raw1
brw-rw---- 1 root disk 8, 112 Jan 23 14:51 /dev/sdh

Above, "c" stands for character device, and "b" for block devices.

Starting with Oracle 10.1 an RDBMS fix added the O_DIRECT flag to the open call
(O_DIRECT flag tells the Linux kernel to bypass the Linux buffer cache and write
directly to disk), in the case of a block device, that ment that a create datafile on
'/dev/sde9' would succeed (need to set filesystemio_options=directio in init.ora).. This
enhancement was well received, and shortly after bug 4309443 was fixed (by adding the
O_DIRECT flag on the OCR file open call) meaning that starting with 10.2 (there are
several 10.1 backports available) the Oracle OCR file could also access block devices
directly. For the voting disk to be opened with O_DIRECT you need fix for bug 4466428
(5021707 is a duplicate). This means that both voting disks and OCR files could live on
block devices. However, due to OUI bug 5005148, there is still a need to configure raw
devices for the voting or OCR files during installation of RAC, not such a big deal, since
it's just 5 files in most cases.

By using block devices you no longer have to live with the limitations of 255 raw devices
per node. You can access as many block devices as the system can support. Also block
devices carry persistent permissions across reboots, while with raw devices one would
have to customize that after installation otherwise the Clusterware stack or database
would fail to startup due to permission issues.

ASM or ASMlib can be given the raw devices (/dev/raw/raw2) as was done in the initial
deployment of 10g Release 1, or the more recommended way: ASM/ASMLib should be
given the block devices directly (eg. /dev/sde9).
Since RAW devices are being phased out of Linux in the long term, it is recommended
everyone should switch to using the block devices (meaning, pass these block devices to
ASM or OCFS/2 or Oracle Clusterware)

How do I stop the GSD?

If you are on 9.0 on Unix you would issue:

$ ps -ef | grep jre


$ kill -9 <gsd process>

Stop the OracleGSDService on Windows.

Note: Make sure that this is the process in use by GSD

If you are on 9.2 you would issue:

$ gsdctl stop

What is the purpose of the gsd service in Oracle 9i RAC?

GSD is only needed for configuration/management of cluster database. Once database


has been configured and up, it can be safely stopped provided you don't run any 'srvctl or
dbca or dbua' tools. In 9iRAC, the GSD doesn't write anywhere unless tracing was turned
on, in which case traces go to stdout.
Once the database has been configured and started and you don't use 'srvctl or EM' to
manage or 'dbca to extend/remove' or 'dbua to upgrade' this database, GSD can be
stopped.

How should I deal with space management? Do I need to set free lists and
free list groups?

Manually setting free list groups is a complexity that is no longer required.


We recommend using Automatic Segment Space Management rather than trying to
manage space manually. Unless you are migrating from an earlier database version with
OPS and have already built and tuned the necessary structures, Automatic Segment Space
Management is the preferred approach.

Automatic Segment Space Management is NOT the default, you need to set it.

For more information see:

Automatic Space Segment Management in RAC Environments

I was installing RAC and my Oracle files did not get copied to the remote
node(s). What went wrong?

First make sure the cluster is running and is available on all nodes. You should be able to
see all nodes when running an 'lsnodes -v' command.

If lsnodes shows that all members of the cluster are available, then you may have an
rcp/rsh problem on Unix or shares have not been configured on Windows.

You can test rcp/rsh on Unix by issuing the following from each node:

[node1]/tmp> touch test.tst


[node1]/tmp> rcp test.tst node2:/tmp

[node2]/tmp> touch test.tst


[node2]/tmp> rcp test.tst node1:/tmp

On Windows, ensure that each node has administrative access to all these directories
within the Windows environment by running the following at the command prompt:

NET USE \\host_name\C$

Clustercheck.exe also checks for this.

More information can be found in the Step-by-Step RAC notes available on Metalink. To
find these search Metalink for 'Step-by-Step Installation of RAC'.
What are the implications of using srvctl disable for an instance in my
RAC cluster? I want to have it available to start if I need it but at
this time to not want to run this extra instance for this database.

During node reboot, any disabled resources will not be started by the Clusterware,
therefore this instance will not be restarted. It is recommended that you leave the vip,
ons,gsd enabled in that node. For example, VIP address for this node is present in address
list of database services, so a client connecting to these services will still reach some
other database instance providing that service via listener redirection. Just be aware that
by disabling an Instance on a node, all that means is that the instance itself is not starting.
However, if the database was originally created with 3 instances, that means there are 3
threads of redo. So, while the instance itself is disabled, the redo thread is still enabled,
and will occasionally cause log switches. The archived logs for this 'disabled' instance
would still be needed in any potential database recovery scenario. So, if you are going to
disable the instance through srvctl, you may also want to consider disabling the redo
thread for that instance.

srvctl disable instance -d orcl -i orcl2

SQL> alter database disable public thread 2;

Do the reverse to enable the instance.

SQL> alter database enable public thread 2;

srvctl enable instance -d orcl -i orcl2

What is the Cluster Verification Utiltiy (cluvfy)?

The Cluster Verification Utility (CVU) is a validation tool that you can use to check all
the important components that need to be verified at different stages of deployment in a
RAC environment. The wide domain of deployment of CVU ranges from initial hardware
setup through fully operational cluster for RAC deployment and covers all the
intermediate stages of installation and configuration of various components. Cluvfy does
not take any corrective action following the failure of a verification task, does not enter
into areas of performance tuning or monitoring, does not perform any cluster or RAC
operation, and does not attempt to verify the internals of cluster database or cluster
elements.
What versions of the database can I use the cluster verification utility
(cluvfy) with?

The cluster verification utility is release with Oracle Database 10g Release 2 but can also
be used with Oracle Database 10g Release 1.

Does RAC work with NTP (Network Time Protocol)?

YES! NTP and RAC are compatible, as a matter of fact, it is recommended to setup NTP
in a RAC cluster, for Oracle 8i/9i and 10g.

From the Documentation:


Oracle� Database Oracle Clusterware and Oracle Real Application
Clusters Installation Guide 10g Release 2 (10.2) for Linux B14203-05
page 2-21:
"Node Time Requirements
Before starting the installation, ensure that each member node of the
cluster is set as closely as possible to the same date and time. Oracle
strongly recommends using the Network Time Protocol feature of most
operating systems for this purpose, with all nodes using the same
reference Network Time Protocol server."

Each machine has a different clock frequency and as a result a slightly different time
drift. NTP computes this time drift within about 15 minutes, and stores this information
in a "drift" file, it then adjusts the system clock based on this known drift as well as
compares it to a given time-server the sys-admins sets up. This is the recommended
approach.

Keep the following points in mind:

• Minor changes in time (in the seconds range) are harmless for RAC and the
Oracle clusterware. If you intend on making large time changes it is best to
shutdown the instances on that node to avoid a false eviction, especially if you are
using the 10g low-brownout patches, which allow really low misscount settings.

• Backup/recovery aspect of large time changes are documented in note 77370.1,


basically you can't use RECOVER DATABASE UNTIL TIME to reach the
second recovery point, It is possible to overcome with RECOVER DATABASE
UNTIL CANCEL or UNTIL CHANGE. If you are doing complete recovery
(most of the times) then this is not an issue since the Oracle recovery code uses
SCN (System Change Numbers) to advance in the redo/archive logs. The SCN
numbers never go back in time (unless a reset-logs operation is performed), there
is always an association of an SCN to a human readable timestamp (which may
change forward or backwards), hence the issue with recovery until point in time
vs. until SCN/Cancel.
• If DBMS_SCHEDULER is in usage it will be affected by time changes, as it's
using actual clock rather than SCN.

Apart from these issues, the Oracle server is immuned to time changes, i.e. will
not affect transaction/read consistency operations.

On Linux the "-x" flag can be added to the ntpd daemon to prevent the clock from
going backwards.

How do I determine whether or not an OneOff patch is "rolling


upgradeable"?

After you have downloaded a patch, you can go into the directory where you unpacked
the patch:

> pwd
/ora/install/4933522
Then use the following OPatch command:
> opatch query -is_rolling
...
Query ...
Please enter the patch location:
/ora/install/4933522
---------- Query starts ------------------
Patch ID: 4933522
....
Rolling Patch: True.
---------- Query ends -------------------

If using plsql native code, the plsql_native_library_dir needs to be defined.


In RAC environement, must the directory be in the shared storage?

In RAC configuration, this parameter must be set in each instance. The instances are not
required to have a shared file system. On each instance the plsql_native_library_dir can
be set to point to an instance local directory. Alternately, if the RAC configuration
supports a shared (cluster) file system, you can use a common directory (on the shared
file system) for all instances. You can also check out the PL/SQL Native Compilation
FAQ on OTN: www.oracle.com/technology/tech/pl_sql/htdocs/ncomp_faq.html
I have 2 clusters named "crs" (the default), how do I get Grid Control to
recognize them as targets?

There are 2 options:


a) if the grid control agent install (which is a separate install) has already been done and
has picked up the name of the cluster as it was configured as CRS, one can go to the EM
console as is, and for the second, manually delete and rediscover the target. When you
rediscover the target, give whatever display name you like

b) Prior to performing the Grid control agent install, just set CLUSTER_NAME
environment variable and run the install. This variable need to be set only for that install
session. No need to set it every time agent starts.

If I am using Vendor Clusterware such as Veritas, IBM, Sun or HP, do I


still need Oracle Clusterware to run Oracle RAC 10g?

Yes. When ceritifed, you can use Vendor clusterware however you must still install and
use Oracle Clusterware for RAC. Best Practice is to leave Oracle Clusterware to manage
RAC. For details see Metalink Note 332257.1 and for Veritas SFRAC see 397460.1.

How many nodes can be had in an HP-UX/Solaris/AIX/Windows/Linux


cluster?
The number of nodes supported is not limited by Oracle, but more
generally by the clustering software/hardware in question.

When using solely Oracle Clusterware: 63 nodes (9i or 10gR1)


With 10g Release 2, the maximum nodes is 100

When using a third party clusterware:

Sun: 8

HP UX: 16

HP Tru64: 8

IBM AIX:
* 8 nodes for Physical Shared (CLVM) SSA disk

* 16 nodes for Physical Shared (CLVM) non-SSA disk

* 128 nodes for Virtual Shared Disk (VSD)

* 128 nodes for GPFS

* Subject to storage subsystem limitations

Veritas: 8-16 nodes (check w/ Veritas)

Where I can find information about how to setup / install RAC on


different platforms ?

There is a roadmap for implementing Real Application Clusters' available at:

A Smooth Transition to Real Application Clusters

There are also Step-by-Step notes available for each platform available on the Metalink
'Top Tech Docs' for RAC:

High Availability - Real Application Clusters Library Page Index

Additional information can be found on OTN:

http://technet.oracle.com/products/oracle9i/content.html --> 'Oracle Real Application


Clusters'

Is it possible to run RAC on logical partitions (i.e. LPARs) or virtual


separate servers.
Yes, it is possible. The E10K and other high end servers can be partitioned into domains
of smaller sizes, each domain with its own CPU(s) and operating system. Each domain is
effectively a virtual server. RAC can be run on cluster comprises of domains. The
benefits of using this is similar to a regular cluster, any domain failure will have little
effect on other domains. Besides, the management of the cluster may be easier since there
is only one physical server. Note however, since one E10K is still just one server. There
are single points of failures. Any failures, such as back plane failure, that crumble the
entire server will shutdown the virtual cluster. That is the tradeoff users have to make in
how best to build a cluster database.

How do I check RAC certification?

See the following Metalink note:

Note 184875.1
How To Check The Certification Matrix for Real Application Clusters

Please note that certifications for Real Application Clusters are performed against the
Operating System and Clusterware versions. The corresponding system hardware is
offered by System vendors and specialized Technology vendors. Some system vendors
offer pre-installed, pre-configured RAC clusters. These are included below under the
corresponding OS platform selection within the certification matrix.

Can the Oracle Database Configuration Assistant (DBCA) be used to


create a database with Veritas DBE / AC 3.5?

DBCA can be used to create databases on raw devices in 9i RAC Release 1 and 9i
Release 2. Standard database creation scripts using SQL commands will work with file
system and raw.

DBCA cannot be used to create databases on file systems on Oracle 9i Release 1. The
user can choose to set up a database on raw devices, and have DBCA output a script. The
script can then be modified to use cluster file systems instead.

With Oracle 9i RAC Release 2 (Oracle 9.2), DBCA can be used to create databases on a
cluster filesystem. If the ORACLE_HOME is stored on the cluster filesystem, the tool
will work directly. If ORACLE_HOME is on local drives on each system, and the
customer wishes to place database files onto a cluster file system, they must invoke
DBCA as follows: dbca -datafileDestination /oradata where /oradata is on the CFS
filesystem. See 9iR2 README and bug 2300874 for more info.

What is Oracle's position with respect to supporting RAC on Polyserve


CFS?

Please check the certification matrix available through Metalink for your specific release.

Is crossover cable supported as an interconnect with RAC on any platform


?

NO. CROSS OVER CABLES ARE NOT SUPPORTED.


The requirement is to use a switch:

Detailed Reasons:
1) cross-cabling limits the expansion of RAC to two nodes
2) cross-cabling is unstable:
a) Some NIC cards do not work properly with it. They are not able
to negotiate the DTE/DCE clocking, and will thus not function.
These NICS were made cheaper by assuming that the switch was going to
have the clock. Unfortunately there is no way to know which NICs do
not have that clock.
b) Media sense behaviour on various OS's (most notably Windows) will
bring a NIC down when a cable is disconnected.
Either of these issues can lead to cluster instability and lead to ORA-
29740 errors (node evictions).

Due to the benefits and stability provided by a switch, and their


afforability ($200 for a simple 16 port GigE switch), and the expense
and time related to dealing with issues when one does not exist, this
is the only supported configuration.

Please see certify.us.oracle.com as well.

Is Veritas Storage Fndation supported with RAC?

Veritas Storage Foundation 4.0 is certified on AIX, Solaris and HPUX for 9i RAC and
Oracle RAC 10g. Veritas is production also on Linux, but it is not certified by Oracle. If
customers choose Veritas on Linux with Oracle 9i, Oracle will support the Oracle
products in the stack.
Veritas Storage Foundation is currently not certified with 10g Release 2 on any platform.
Check Certify for the latest information.

Is RAC on VMWare supported?

No. We do not support RAC on VMWare. Aside from the support restrictions for the
database on VMWare outlined in Metalink Note 249212.1, there is a technical issue with
VMWare periodically resynchronizing it's system clock with the underlying OS. This can
disrupt the underlying clusterware services.

Is 3rd Party Clusterware supported on Linux such as Veritas or Redhat?

No, Oracle RAC 10g does not support 3rd Party clusterware on Linux. This means that if
a cluster file system requires a 3rd party clusterware, the cluster file system is not
supported.

Can you have multiple RAC $ORACLE_HOME's on Linux?

No, there should be only one Oracle Cluster Manager (ORACM) running on each node.
All RAC databases should run out of the $ORACLE_HOME that ORACM is installed in.

After installing patchset 9013 and patch_2313680 on Linux, the startup


was very slow

Please carefully read the following new information about configuring Oracle Cluster
Management on Linux, provided as part of the patch README:

Three parameters affect the startup time:

soft_margin (defined at watchdog module load)


-m (watchdogd startup option)

WatchdogMarginWait (defined in nmcfg.ora).

WatchdogMarginWait is calculated using the formula:

WatchdogMarginWait = soft_margin(msec) + -m + 5000(msec).

[5000(msec) is hardcoded]

Note that the soft_margin is measured in seconds, -m and WatchMarginWait are


measured in milliseconds.

Based on benchmarking, it is recommended to set soft_margin between 10 and 20


seconds. Use the same value for -m (converted to milliseconds) as used for soft_margin.
Here is an example:

soft_margin=10 -m=10000 WatchdogMarginWait = 10000+10000+5000=25000

If CPU utilization in your system is high and you experience unexpected node reboots,
check the wdd.log file. If there are any 'ping came too late' messages, increase the value
of the above parameters.

Is CFS Available for Linux?

Yes, OCFS (Oracle Cluster Filesystem) is now available for Linux. The following
Metalink note has information for obtaining the latest version of OCFS:

Note 238278.1 - How to find the current OCFS version for Linux

Where can I find more information about hangcheck-timer module on


Linux ? And how do we configure hangcheck-timer module ?
In releases 9.2.0.2.0 and later, Oracle recommends using a new I/O
fencing model -- HangCheck-Timer module. Hangcheck-Timer
module monitors the Linux kernel for long operating system hangs that
could affect the reliability of a RAC node. You can configure
hangcheck-timer module using 3 parameters -- hangcheck_tick,
hangcheck_margin and MissCount.
For more details, please review Note :: 259487.1

Can RAC 10g and 9i RAC be installed and run on the same physical Linux
cluster?

Yes - However Oracle Clusterware (CRS) will not support a 9i RAC database so you will
have to leave the current configuration in place. You can install Oracle Clusterware and
RAC 10g into the same cluster. On Windows and Linux, you must run the 9i Cluster
Manager for the 9i Database and the Oracle Clusterware for the 10g Database. When you
install Oracle Clusterware, your 9i srvconfig file will be converted to the OCR. Both 9i
RAC and 10g will use the OCR. Do not restart the 9i gsd after you have installed Oracle
Clusterware. Remember to check certify for details of what vendor clusterware can be
run with Oracle Clusterware.

Is the hangcheck timer still needed with Oracle RAC 10g?


YES! The hangcheck-timer module monitors the Linux kernel for extended
operating system hangs that could affect the reliability of the RAC
node ( I/O fencing) and cause database corruption. To verify the
hangcheck-timer module is running on every node:

as root user:
/sbin/lsmod | grep hangcheck

If the hangcheck-timer module is not listed enter the following command


as the root user:

/sbin/insmod hangcheck-timer hangcheck_tick=2 hangcheck_margin=1

(Note that in 9i, the recommended values for tick and margin were 30
and 180, respectively).
To ensure the module is loaded every time the system reboots, verify
that the local system startup file (/etc/rc.d/rc.local) contains the
command above.

For additional information please review the Oracle RAC Install and
Configuration Guide (5-41).

How to configure bonding on Suse SLES8.


Please see note:291958.1

How to configure bonding on Suse SLES9.


Please see note:291962.1

Oracle Clusterware fails to start after a reboot due to permissions on raw


devices reverting to default values, How to fix?

After a successful installation of Oracle Clusterware a simple reboot and the Clusterware
fails to start. This is because the permissions on the raw devices for the OCR and voting
disks e.g. /dev/raw/raw{x} revert to their default values (root:disk) and are inaccessible to
Oracle. This change of behavor started with the 2.6 kernel; in RHEL4, OEL4, RHEL5,
OEL5, SLES9 and SLES10. In RHEL3 the raw devices maintained their permissions
across reboots so this symptom was not seen.

The way to fix this is on RHEL4, OEL4 and SLES9 is to create


/etc/udev/permission.d/40-udev.permissions (you must choose a number that's lower than
50). You can do this by copying /etc/udev/permission.d/50-udev.permissions, and
removing the lines that are not needed (50-udev.permissions gets replaced with upgrades
so you do not want to edit it directly, also a typo in the 50-udev.permissions can render
the system non-usable). Example permissions file:
# raw devices
raw/raw[1-2]:root:oinstall:0640
raw/raw[3-5]:oracle:oinstall:0660

Note that this applied to all raw device files, here just the voting and OCR devices were
specified.

On RHEL5, OEL5 and SLES10 a different file is used /etc/udev/rules.d/99-raw.rules,


notice that now the number must be (any number) higher than 50. Also the syntax of the
rules is different than the permissions file, here's an example:

KERNEL=="raw[1-2]*", GROUP="oinstall", MODE="640"


KERNEL=="raw[3-5]*", OWNER="oracle", GROUP="oinstall", MODE="660"

This is explained in details in Note: 414897.1.


A customer installed 10g R2 on Linux RH4 Update 2, 2.6.9-22.ELsmp #1
SMP x86_64 GNU/Linux, and got the error Error in invoking target
'all_no_orcl'. Customer ignored the error and the install succeeded
without any other errors and oracle pparently worked fine. What
should they do?
Because of compatibility with their storage array (EMC DMX with
Powerpath 4.5) they must use update 2. Oracle install guide states
that RH4 64 bits update 1 "or higher" should be used for 10g R2.

The binutils patch binutils-2.15.92.0.2-13.0.0.0.2.x86_64.rpm is needed to relink without


error. Red hat is aware of the bug and has a test version and they will have it fixed for
update3 however it is taking them time. so using the binutils rpm you mentioned you
should be all good to go. if the customer wants they can update that package again after
install. at least we were able to provide a fix directly to get folks going.

Is OCFS2 certified with RAC 10g?

Yes. See Certify to find out which platforms are currently certified.

Customer did not load the hangcheck-timer before installing RAC, Can
the customer just load the hangcheck-timer ?

YES. Customer can install the hangcheck timer and load it. No need to reboot the nodes.

My customer is about to install 10202 clusterwere on new Linux


machinges. He is getting "No ORACM running" error when run
rootpre.sh and exited? Should he worry about this message?

It is an informational message. Generally for such scripts, you can issue echo �$?� to
ensure that it returns a zero value. The message is basically saying, it did not find an
oracm. If Customer were installing 10g on an existing 9i cluster (which will have oracm)
then this message would have been serious. But since customer is installing this on a
fresh new box, They can continue the install.

How do I configure my RAC Cluster to use the RDS Infiniband?

The configuration takes place below Oracle. You need to talk to your Infiniband vendor.
Check certify for what is currently available as this will change as vendors adopt the
technology. The database must be at least 10.2.0.3. If you want to switch a database
running with IP over IB, you will need to relink Oracle.
$ cd $ORACLE_HOME/rdbms/lib $ make -f ins_rdbms.mk ipc_rds ioracle

You can check your interconnect through the alert log at startup. Check for the string
�cluster interconnect IPC version:Oracle RDS/IP (generic)� in the alert .log file.

Client is running Veritas cluster on a SunOS 2.9. When we ran the ran the
10.2.0 installer it did not discover the nodes but with 9i it was able to
discover both the nodes. Is there anything specific to be done for
10.2.0 db install?

You have no idea what a wild ride you are in for. It is imperative that you follow the
Symantec install guide, all the way thru the root.sh patch step of the Oracle Clusterware
install. The guide is on the install media. If you don't have it, you can pull it from Jack
Connelly's staging area.
It is an NFS mount. Just mount jacksun1.us.oracle.com:/stage to a local UNIX box.
cd to Veritas/docs .... copy sfrac_install.pdf to any local directory and you will have what
you need.
This install isn't too bad unless you are on Solaris 10. .... give yourself lots of time to
implement .... a month to six weeks is not unrealistic - your milage will vary depending
on Veritas / Solaris knowledge. Thanks to Jack Connelly in VOS Support.

Does Sun Solaris have a multipathing solution ?

Sun Solaris includes an inherent Multipathing tool: MPXIO - this is part of Solaris. You
need to have the SanFoundation Kit installed (newest version). Please, be aware that the
machines are installed following the EIS-standard. This is a quality assurance standard
introduced by Sun that mainly takes care that you always have the newest patches.
MPXIO is free of charge and comes with Solaris 8,9,10. BTW, if you have a Sun LVM, it
would use this feature indirectly. Therefore, Sun confirmed that MPXIO will work with
RAWs.

Does Oracle Support RAC with Solaris 10 Containers (aka Zones)?

No. RAC is currently not supported with Solaris 10 Local Containers. You can use a
Global container but remember 1 global container per system or per domain. So, in case
your hardware is capable of being split up in domains, you may have more than 1 global
container on the whole system (hardware), that is per domain.

In local containers, you cannot manipulate hardware in any way, shape or form. You can't
plumb and unplumb network interfaces .... nothing ... even as the local container root
user. You can only do this in the global container. We rely on the uadmin command to
quickly bring down a node if an urgent condition is detected. As I recall, you can't do this
from the local container either. CRS has to maintain the ability to manipulate hardware
and this just is not going to happen in a local container.

The answer is the same if you are using Vendor Clusterware such as Veritas SF RAC or
Sun Cluster.

Can I configure IPMP in Actie/Active to increase bandwidth of my


interconnect?

For IPMP For active/active configurations please follow the sun doc instructions
http://docs.sun.com/app/docs/doc/816-4554/6maoq027i?a=view IPMP active/active is
known to load balance on transmit but serialize on a single interface for receive. So you
are likely not to get the throughput you might have expected. Unless you experience
explicit bandwidth limitations that require active/active, it is a recommended best practice
to configure for maximum availability, as described in webiv note 283107.1.
Please note too that debugging active/active interfaces at the network layer is
cumbersome and time consuming. In an active/active configuration and the switch side
link fails, you are likely to lose both interconnect connections, whereas active/standby,
you would failover.

Is HMP supported with 10g on all HP platforms ?


- 10g RAC + HMP + PA-RISC = yes

- 10g RAC + HMP + Itanium, "Oracle has no plans and will likely never
support RAC over HMP on IPF."

- 10g RAC + UDP + Itanium = yes (even over Hyperfabric)

"Oracle recommends that HMP not be used. UDP is the recommended


interconnect protocol across all platforms."

Does the Oracle Cluster File System (OCFS) support network access
through NFS or Windows Network Shares?
No, in the current release the Oracle Cluster File System (OCFS) is not
supported for use by network access approaches like NFS or Windows
Network Shares.

My customer wants to understand what type of disk caching they can use
with their Windows RAC Cluster, the install guide tells them to
disable disk caching?

If the write cache identified is local to the node then that is bad for RAC. If the cache is
visible to all nodes as a 'single cache', typically in the storage array, and is also 'battery
backed' then that is OK.

Can I run my 9i RAC and RAC 10g on the same Windows cluster?

Yes but the 9i RAC database must have the 9i Cluster Manager and you must run Oracle
Clusterware for the Oracle Database 10g. 9i Cluster Manager can coexsist with Oracle
Clusterware 10g.
Be sure to use the same 'cluster name' in the appropriate OUI field for both 9i and 10g
when you install both together in the same cluster.
The OracleCMService9i service will remain intact during the Oracle Clusterware 10g
install, as a 9i RAC database would require that the 9i OracleCMService9i, it should be
left running. The information for the 9i database will get migrated to the OCR during the
Oracle Clusterware installation. Then, for future database management, you would use
the 9i srvctl to manage the 9i database, and the 10g srvctl to manage any new 10g
databases. Both srvctl commands will use the OCR.

When I try to login to the +ASM2 on node2 with asmcmd (after setting
ORACLE_HOME and ORACLE_SID correctly) I get: ORA-01031:
insufficient privileges (DBD ERROR: OCI SessionBegin). When I
try to login to +ASM2 using sqlplus (connect / as sysdba) I get the
same ORA-01031: insufficient privileges. When I try to login to
+ASM2 using sqlplus (connect sys/passwd as sysdba) I get connected
successfully.

This sounds like the ORA_DBA group on Node2 is empty, or else does not have the
correct username in it. Double-check what user account you are using to logon to Node2
as ( a 'set' command will show you the USERNAME and USERDOMAIN values) and
then make sure that this account is part of ORA_DBA.
The other issue to check is that SQLNET.AUTHENTICATION_SERVICES=(NTS) is
set in the SQLNET.ORA

My customer has a failsafe cluster installed, what are the benefits of


moving their system to RAC?

Fail Safe development is continuing. Most work on the product will be around
accomodating changes in the supported resources (new releases of RDBMS, AS, etc.)
and the underlying Microsoft Cluster Services and Windows operating system.
A failsafe protected instance is an Active/Passive instance so, as such, does not benefit
that much at all from adding more nodes to a cluster. Microsoft have a limit of nodes in a
MSCS cluster. (typically 8 nodes - but it does vary). RAC is active active so you get dual
benefits of increased scalability and availability every time you add a node to a cluster.
We have a limit of 100 nodes in a RAC cluster (we don't use MSCS). Your customer
should really consider more than 2 nodes. (because of aggregate computer power on node
failure). If the choice is 2 of 4 CPU nodes or 4 of 2CPU node then I would go for 2 CPU
nodes. Customers are using both Windows Itanium RAC and Windows X64 RAC.
Windows X64 seems more popular.
Keep in mind, though, that for Fail Safe, if the server is 64-Bit, regardless of flavor, Fail
Safe Manager must be installed on a 32-Bit client, which will complicate things just a bit.
There is no such restriction for RAC, as all management for RAC can be done via Grid
Control or Database Control.
For EE RAC you can implement an 'extended cluster' where there is a distance between
the nodes in the cluster (usually less than 20 KM).

Do I need HACMP/GPFS to store my OCR/Voting file on a shared device.

The prerequisites doc for AIX clearly says:

"If you are not using HACMP, you must use a GPFS file system to store the
Oracle CRS files" ==> this is a documentation bug and this will be fixed with
10.1.0.3

-----

On AIX it is important to put the reserve_lock=no/reserve_policy =no_reserve

in order to allow AIX to access the devices from more than one node
simultaneously.

Use the /dev/rhdisk devices (character special) for the crs and voting disk and
change the attribute with the command

chdev -l hdiskn -a reserve_lock=no

(for ESS, EMC, HDS, CLARiiON, and MPIO-capable devices you have to do an
chdev -l hdiskn -a reserve_policy=no_reserve)

Is HACMP needed for RAC on AIX 5.2 using GPFS file system?

The newest version of GPFS can be used without HACMP, if it is available for AIX 5.2
then you do not need HACMP.
Is VIO supported with RAC on IBM AIX?

VIO is not supported for storage. IBM is still working to improve the shared disk
capability to use it with RAC. So currently if your customer wants RAC, he must attach
all shared disks to store our database via direct attachments. But VIO delivers networks
features usable with RAC, for example if the customer is planning to use several LPARs
as support of RAC instances a VLAN could be implemented for or RAC interconnect. A
Virtual Ethernet could also be used for our VIP onfiguration. So in conclusion, you
should discuss with your customer about the global architecture and check which part of
the VIO could be used, you should also analyze some performance aspects (keep in mind
that using a shared resources can impact performance...)

Can I run Oracle RAC 10g on my IBM Mainframe Sysplex environment


(z/OS)?

YES! There is no separate documentation for RAC on z/OS. What you would call
"clusterware" is built in to the OS and the native file systems are global. IBM z/OS
documentation explains how to set up a Sysplex Cluster; once the customer has done that
it is trivial to set up a RAC database. The few steps involved are covered in in Chapter 14
of the Oracle for z/OS System Admin Guide, which you can read here. There is also an
Install Guide for Oracle on z/OS ( here) but I don't think there are any RAC-specific steps
in the installation. By the way, RAC on z/OS does not use Oracle's clusterware
(CSS/CRS/OCR).

Is Oracle Application Server integrated with FAN and FCF?

Yes, For detailed information on the integration with the various releases of Application
Server 10g,
http://www.oracle.com/technology/tech/java/newsletter/articles/oc4j_data_sources/oc4j_
ds.htm

Are Oracle Applications certified with RAC?


For Siebel, PeopleSoft see http://realworld.us.oracle.com/isv/siebel.htm Oracle 9i RAC
(9.2) and Oracle RAC 10g (10.1) are certified with Oracle Applications EBusiness Suute.
See Note: 285267.1 for details.

Can I use Oracle Clusterware for failover of the SAP Enqueue and VIP
services when running SAP in a RAC environment?

Oracle has created sapctl to do this and it is available for certain platforms. SAPCTL will
be available for download on SAP Services Marketplace on AIX and Linux. For Solaris,
it will not be available in 2007, use Veritas or Sun Cluster.

How do I gather all relevant Oracle and OS log/trace files in a RAC cluster
to provide to Support?

Use RAC-DDT (RAC Diagnostic Data Tool), User Guide is in Metalink note# 301138.1.
Quote from the User Guide:

RACDDT is a data collection tool designed and configured specifically


for gathering diagnostic data related to Oracle's Real Application
Cluster (RAC) technology. RACDDT is a set of scripts and configuration
files that is run on one or more nodes of a RAC cluster. The main
script is written in Perl, while a number of proxy scripts are written
using Korn shell. RACDDT will run on all supported Unix and Linux
platforms, but is not supported on any Windows platforms.

Newer versions of RDA (Remote Diagnostic Agent) have the RAC-DDT functionality,
so going forward RDA is the tool of choice. The RDA User Guide is in Metalink note#
314422.1

What are the cdmp direin the background_dump_dest used for?

These directories are produced by the diagnosibility daemon process (DIAG). DIAG is a
process related to RAC which as one of its tasks, performs cash dumping. The DIAG
process dumps out tracing to file when it discovers the death of an essential process
(foreground or background) in the local instance. A dump directory named something
like cdmp_ is created in the bdump or background_dump_dest directory, and all the trace
dump files DIAG creates are placed in this directory.
Is the Oracle E-Business Suite (Oracle Applications) certified against
RAC?

Yes. (There is no seperate certification required for RAC.) ""

What is the optimal migration path to be used while migrating the E-


Business suite to RAC?

Following is the recommended and most optimal path to migrate you E-Business suite to
RAC environment:

1. Migrate the existing application to new hardware. (If applicable).

2. Use Clustered File System for all data base files or migrate all database files to raw
devices. (Use dd for Unix or ocopy for NT)

3. Install/upgrade to the latest available e-Business suite.

4. Upgrade database to Oracle9i (Refer document 216550.1 on Metalink)

5. In step 4, install RAC option while installing Oracle9i and use Installer to perform
install for all the nodes.

6. Clone Oracle Application code tree.

Reference Documents:
Oracle E-Business Suite Release 11i with 9i RAC: Installation and Configuration :
Metalink Note# 279956.1
E-Business Suite 11i on RAC : Configuring Database Load balancing & Failover:
Metalink Note# 294652.1
Oracle E-Business Suite 11i and Database - FAQ : Metalink# 285267.1

How to configure concurrent manager in a RAC environment?

Large clients commonly put the concurrent manager on a separate server now (in the
middle tier) to reduce the load on the database server. The concurrent manager programs
can be tied to a specific middle tier (e.g., you can have CMs running on more than one
middle tier box). It is advisable to use specilize CM. CM middle tiers are set up to point
to the appropriate database instance based on product module being used.

Should functional partitioning be used with Oracle Applications?

We do not recommend functional partitioning unless throughput on your server


architecture demands it. Cache fusion has been optimized to scale well with non-
partitioned workload.

If your processing requirements are extreme and your testing proves you must partition
your workload in order to reduce internode communications, you can use Profile Options
to designate that sessions for certain applications Responsibilities are created on a
specific middle tier server. That middle tier server would then be configured to connect to
a specific database instance.

To determine the correct partitioning for your installation you would need to consider
several factors like number of concurrent users, batch users, modules used, workload
characteristics etc.

Which e-Business version is prefereable?

Versions 11.5.5 onwards are certified with Oracle9i and hence with Oracle9i RAC.
However we recommend the latest available version.

Can I use Automatic Undo Management with Oracle Applications?

Yes. In a RAC environment we highly recommend it.

Can I use TAF with e-Business in a RAC environment?

TAF itself does not work with e-Business suite due to Forms/TAF limitations, but you
can configure the tns failover clause. On instance failure, when the user logs back into the
system, their session will be directed to a surviving instance, and the user will be taken to
the navigator tab. Their committed work will be available; any uncommitted work must
be re-started.

We also recommend you configure the forms error URL to identify a fallback middle tier
server for Forms processes, if no router is available to accomplish switching across
servers.

Can I use OCFS with SE RAC?

It is not supported to use OCFS with Standard Edition RAC. All database files must use
ASM (redo logs, recovery area, datafiles, control files etc). You can not place binaries on
OCFS as part of the SE RAC terms. We recommend that the binaries and trace files (non-
ASM supported files) to be replicated on all nodes. This is done automatically by install.

What are the maximum number of nodes under OCFS on Linux ?

Oracle 9iRAC on Linux, using OCFS for datafiles, can scale to a maximum of 32 nodes.

Where can I find documentation on OCFS ?

For Main Page >>> http://oss.oracle.com/projects/ocfs/ For User Manual >>>


http://oss.oracle.com/projects/ocfs/documentation/ For OCFS Files >>>
http://oss.oracle.com/projects/ocfs/files/supported/

What files can I put on Linux OCFS?

For optimal performance, you should only put the following files on Linux OCFS:

- Datafiles
- Control Files
- Redo Logs
- Archive Logs
- Shared Configuration File (OCR)
- Quorum / Voting File
- SPFILE

Is Sun QFS supported with RAC? What about Sun GFS?

From certify, check there for the latest details.

Sun Cluster - Sun StorEdge QFS (9.2.0.5 and higher,10g and 10gR2):
No restrictions on placement of files on QFS
Sun StorEdge QFS is supported for Oracle binary executables, database data files,
database data files, archive logs, Oracle Cluster Registry (OCR), Oracle Cluster
ReadyServices voting disk and recovery area can be placed on QFS.
Solaris Volume Manager for Sun Cluster can be used for host-based mirroring
Supports up to 8 nodes

Is Red Hat GFS(Global File System) is certified by Oracle for use with
Real Application Clusters?

Sistina Cluster Filesystem is not part of the standard RedHat kernel and therefore is not
certified by Oracle but falls under a kernel extension. This however, does not mean that
Oracle RAC is not certified with it. As a fact, Oracle RAC does not certify against a
filesystem per se, but certifies against an operating system. If, as is the case with Sistina
filesystem, the filesystem is certified with the operating system, this only means that the
Oracle does not provide direct support and fix the filesystem in case of an error.
Customer will have to contact the filesystem provider for support.

Is Linux OCFS2 (OCFS version 2) supported with RAC?

Yes See Certify for details on which platforms are supported.

What is the maximum number of nodes I can have in my cluster if I am


using OCFS2?
Theroetically you can have up to 255 however it has been tested with up to 16 nodes.

Can I run a 10.1.0.x database with Oracle Clusterware 10.2 ?

Yes. Oracle Clusterware 10.2 will support both 10.1 and 10.2 databases (and ASM too!).
A detailed matrix is available in Metalink Note 337737.1

In the course of failure testing in an extended RAC environment we find


entries in the cssd logfile which indicate actions like
'diskShortTimeout set to (value)' and 'diskLongTimeout set to
(value)'.
Can anyone please explain the meaning of these two timeouts in
addition to diskTimout?

Having a short and long disktimeout, and no longer just one disktimeout, is due to patch
for bug 4748797 (included in 10.2.0.2). The long disktimeout is 200 sec by default unless
set differently via 'crsctl set css disktimeout', and applies to time outside a
reconfiguration. The short disktimeout is in effect during a reconfiguration and is
misscount-3s. The point is that we can tolerate a long disktimeout when all nodes are just
running fine, but have to revert back to a short disktimeout if there's a reconfiguration.

Customer is hitting bug 4462367 with an error message saying low open
file descriptor, how do I work around this until the fix is released
with the Oracle Clusterware Bundle for 10.2.0.3 or 10.2.0.4 is
released?

The fix for "low open file descriptor" problem is to increase the ulimit for Oracle
Clusterware. Please be careful when you make this type of change and make a
backup copy of the init.crsd before you start! To do this, you can modify the init.crsd
as follows, while you wait for the patch: 1. Stop Oracle Clusterware on the node (crsctl
stop crs)
2. copy the /etc/init.d/init.crsd
3. Modify the file changing:
# Allow the daemon to drop a diagnostic core file/
ulimit -c unlimited
ulimit -n unlimited
to
# Allow the daemon to drop a diagnostic core file/
ulimit -c unlimited
ulimit -n 65536

4. restart Oracle Clusterware in the node (crsctl start crs)

How to move the OCR location ?

- stop the CRS stack on all nodes using "init.crs stop" - Edit /var/opt/oracle/ocr.loc on all
nodes and set up ocrconfig_loc=new OCR device - Restore from one of the automatic
physical backups using ocrconfig -restore. - Run ocrcheck to verify. - reboot to restart the
CRS stack. - additional information can be found at http://st-
doc.us.oracle.com/10/101/rac.101/b10765/storage.htm#i1016535

Is it supported to rerun root.sh from the Oracle Clusterware installation ?

Rerunning root.sh after the initial successful install of the Oracle Clusterware is expressly
discouraged and unsupported. We strongly recommend not doing it.

In case where root.sh is failing to execute for the on an initial install (or a new node
joining an existing cluster), it is OK to re-run root.sh after the cause of the failure is
corrected (permissions, paths, etc.). In this case, please run rootdelete.sh to undo the local
effects of root.sh before re-running root.sh.

Can I change the public hostname in my Oracle Database 10g Cluster


using Oracle Clusterware?

Hostname changes are not supported in Oracle Clusterware (CRS), unless you want to
perform a deletenode followed by a new addnode operation.
The hostname is used to store among other things the flag files and CRS stack will not
start if hostname is changed.
Is it supported to allow 3rd Party Clusterware to manage Oracle resources
(instances, listeners, etc) and turn off Oracle Clusterware
management of these?

In 10g we do not support using 3rd Party Clusterware for failover and restart of Oracle
resources. Oracle Clusterware resources should not be disabled.

What is the High Availability API?

An application-programming interface to allow processes to be put under the High


Availability infrastructure that is part of the Oracle Clusterware distributed with Oracle
Database 10g. A user written script defines how Oracle Clusterware should start, stop and
relocate the process when the cluster node status changes. This extends the high
availability services of the cluster to any application running in the cluster. Oracle
Database 10g Real Application Clusters (RAC) databases and associated Oracle
processes (E.G. listener) are automatically managed by the clusterware.

Is it possible to use ASM for the OCR and voting disk?

No, the OCR and voting disk must be on raw or CFS (cluster filesystem).

Can I set up failover of the VIP to another card in the same machine or
what do I do if I have different network interfaces on different nodes
in my cluster (I.E. eth0 on node1,2 and eth1 on node 3,4)?

With srvctl, you can modify the nodeapp for the VIP to list the NICs it can use. Then VIP
will try to start on eth0 interface and if it fails, try eth1 interface.
./srvctl modify nodeapps -n -A / /eth0\|eth1
Note how the interfaces are a list separated by the �|� symbol and how you need to
quote this with a �\� character or the Unix shell will interpret the character as a
�pipe�. So on a node called ukdh364 with a VIP address of ukdh364vip and we want a
netmask (say) of 255.255.255.0 then we have:
./srvctl modify nodeapps -n ukdh364 -A ukdh364vip/255.255.255.0/eth0\|eth1
To check which interfaces are configured as public or private use oifcfg getif
example output:
eth0 138.2.238.0 global public
eth1 138.2.240.0 global public
eth2 138.2.236.0 global cluster_interconnect
An ifconfig on your machine will show what the hardware names for the interface cards
installed.

My customer has noticed tons of log files generated under


$CRS_HOME/log/ /client, is there any way automated way we can
setup through Oralce Clusterware to prevent/minimize/remove
those aggressively generated files?

Check Note.5187351.8 You can either apply the patchset if it is available for your
platform or have a cron job that removes these files until the patch is available.

During Oracle Clusterware installation, I am asked to define a private


node name, and then on the next screen asked to define which
interfaces should be used as private and public interfaces. What
information is required to answer these questions?

The private names on the first screen determine which private interconnect will be used
by CSS.
Provide exactly one name that maps to a private IP address, or just the IP address itself. If
a logical name is used, then the IP address this maps to can be changed subsequently, but
if you IP address is specified CSS will always use that IP address. CSS cannot use
multiple private interconnects for its communication hence only one name or IP address
can be specified.

The private interconnect enforcement page determines which private interconnect will be
used by the RAC instances.
It's equivalent to setting the CLUSTER_INTERCONNECTS init.ora parameter, but is
more convenient because it is a cluster-wide setting that does not have to be adjusted
every time you add nodes or instances. RAC will use all of the interconnects listed as
private in this screen, and they all have to be up, just as their IP addresses have to be
when specified in the init.ora paramter. RAC does not fail over between cluster
interconnects; if one is down then the instances using them won't start.
Can I change the name of my cluster after I have created it when I am
using Oracle Clusterware?

No, you must properly deinstall Oracle Clusterware and then re-install. To properly de-
install Oracle Clusterware, you MUST follow the directions in the Installation Guide
Chapter 10. This will ensure the ocr gets cleaned out.

What should the permissions be set to for the voting disk and ocr when
doing a RAC Install?

The Oracle Real Application Clusters install guide is correct. It describes the PRE
INSTALL ownership/permission requirements for ocr and voting disk. This step is
needed to make sure that the CRS install succeeds. Please don't use those values to
determine what the ownership/permmission should be POST INSTALL. The root script
will change the ownership/permission of ocr and voting disk as part of install. The POST
INSTALL permissions will end up being : OCR - root:oinstall - 640 Voting Disk -
oracle:oinstall - 644

Which processes access the OCR ?

Oracle Cluster Registry (OCR) is used to store the cluster configuration information
among other things. OCR needs to be accessible from all nodes in the cluster. If OCR
became inaccessible the CSS daemon would soon fail, and take down the node. PMON
never needs to write to OCR. To confirm if OCR is accessible, try ocrcheck from your
ORACLE_HOME and ORA_CRS_HOME.

How do I restore OCR from a backup? On Windows, can I use ocopy?

The only recommended way to restore an OCR from a backup is "ocrconfig -restore ".
The ocopy command will not be able to perform the restore action for OCR.

Does the hostname have to match the public name or can it be anything
else?
When there is no vendor clusterware, only CRS, then the public node name must match
the host name. When vendor clusterware is present, it determines the public node names,
and the installer doesn't present an opportunity to change them. So, when you have a
choice, always choose the hostname.

Is it a requirement to have the public interface linked to ETH0 or does it


only need to be on a ETH lower than the private interface?: - public
on ETH1 - private on ETH2

There is no requirement for interface name ordering. You could have - public on ETH2 -
private on ETH0 Just make sure you choose the correct public interface in VIPCA, and in
the installer's interconnect classification screen.

How to Restore a Lost Voting Disk used by Oracle Clusterware 10g


Please read Note:279793.1 and for OCR Note:268937.1

As long as you can confirm via the CSS daemon logfile that it thinks the voting disk is
bad, you can restore the voting disk from backup while the cluster is online. This is the
backup that you took with dd (by the manual's request) after the most recent addnode,
deletenode, or install operation. If by accident you restore a voting disk that the CSS
daemon thinks is NOT bad, then the entire cluster will probably go down.
crsctl add css votedisk - adds a new voting disk
crsctl delete css votedisk - removes a voting disk
Note: the cluster has to be down. You can also restore the backup via dd when the cluster
is down.

With Oracle Clusterware 10g, how do you backup the OCR?


There is an automatic backup mechanism for OCR. The default location
is : $ORA_CRS_HOME\cdata\"clustername"\

To display backups : ocrconfig -showbackup


To restore a backup : ocrconfig -restore

The automatic backup mechanism keeps upto about a week old copy. So, if
you want to retain a backup copy more than that, then you should copy
that "backup" file to some other name.
Unfortunately there are a couple of bugs regarding backup file
manipulation, and changing default backup dir on Windows. These will be
fixed in 10.1.0.4. OCR backup on Windows are absent. Only file in the
backup directory is
temp.ocr which would be the last backup. You can restore this most
recent backup by using the command ocr -restore temp.ocr

If you want to take a logical copy of OCR at any time use : ocrconfig
-export
, and use -import option to restore the contents back.

How do I protect the OCR and Voting in case of media failure?

In Oracle Database 10g Release 1 the OCR and Voting device are not mirrored within
Oracle,hence both must be mirrored via a storage vendor method, like RAID 1.
Starting with Oracle Database 10g Release 2 Oracle Clusterware will multiplex the OCR
and Voting Disk (two for the OCR and three for the Voting).
Please read Note:279793.1 and Note:268937.1 regarding backup and restore a lost
Voting/OCR and FAQ 6238 regarding OCR backup.

How do I use multiple network interfaces to provide High Availability


and/or Load Balancing for my interconnect with Oracle
Clusterware?

This needs to be done externally to Oracle Clusterware usually by some OS provided nic
bonding which gives Oracle Clusterware a single ip address for the interconnect but
provide failover (High Availability) and/or load balancing across multiple nic cards.
These solutions are provided externally to Oracle at a much lower level than the Oracle
Clusterware, hence Oracle supports using them, the solutions are OS dependent and
therefore the best source of information is from your OS Vendor. However, there are
several articles in Metalink on how to do this. For example for Sun Solaris search for
IPMP (IP network MultiPathing).

Note: Customer should pay close attention to the bonding setup/configuration/features


and ensure their objectives are met, since some solutions provide only failover and some
only loadbalancing still others claim to provide both. As always, it's always important to
test your setup to ensure it does what it was designed to do.

For Linux, read the doc on rac.us :


Configure Redundant Network Cards / Switches for Oracle Database 10g Release 1 Real
Application Cluster on Linux
From the Linux bonding documentation on
/usr/src/linux/Documentation/networking/bonding.txt: we see this section:

The Linux bonding driver provides a method for aggregating


multiple network interfaces into a single logical "bonded" interface.
The behavior of the bonded interfaces depends upon the mode; generally
speaking, modes provide either hot standby or load balancing services.

So the current Linux implementation supports either failover (HA) or load balancing, but
not both. Third party vendors may be able to provide custom tailored solutions for Linux
that (would probably fall outside the scope of Unbreakable support from Oracle but) will
provide both failover and load balancing.

How do I put my application under the control of Oracle Clusterware to


achieve higher availability?

First write a control agent. It must accept 3 different parameters: start-The control agent
should start the application, check-The control agent should check the application, stop-
The Control agent should start the application. Secondly you must create a profile for
your application using crs_profile. Thirdly you must register your application as a
resource with Oracle Clusterware (crs_register). See the RAC Admin and Deployment
Guide for details.

Can I use Oracle Clusterware to provide cold failover of my 9i or 10g


single instance Oracle Databases?

Oracle does not provide the necessary wrappers to fail over single-instance databases
using Oracle Clusterware 10g Release 2. But since it's possible for customers to use
Oracle Clusterware to wrap arbitrary applications, it'd be possible for them to wrap
single-instance databases this way.

Does Oracle Clusterware support application vips?

Yes, with Oracle Database 10g Release 2, Oracle Clusterware now supports an
"application" vip. This is to support putting applications under the control of Oracle
Clusterware using the new high availability API and allow the user to use the same URL
or connection string regardless of which node in the cluster the application is running on.
The application vip is a new resource defined to Oracle Clusterware and is a functional
vip. It is defined as a dependent resource to the application. There can be many vips
defined, typically one per user application under the control of Oracle Clusterware. You
must first create a profile (crs_profile), then register it Clusterware (crs_register). The
usrvip script must run as root.

Why is the home for Oracle Clusterware not recommended to be


subdirectory of the Oracle base directory?

If anyone other than root has write permissions to the parent directories of the CRS home,
then they can give themselves root escalations. This is a security issue. The CRS home
itself is a mix of root and non-root permissions, as appropriate to the security
requirements. Please follow the install docs about who is your primary group and what
other groups you need to create and be a member of.

How is the voting disk used by Oracle Clusterware?

The voting disk is accessed exclusively by CSS (one of the Oracle Clusterware daemons).
This is totally different from a database file. The database looks at the database files and
interacts with the CSS daemon (at a significantly higher level conceptually than any
notion of "voting disk").

"Non-synchronized access" (i.e. database corruption) is prevented by ensuring that the


remote node is down before reassigning its locks. The voting disk, network, and the
control file are used to determine when a remote node is down, in different, parallel,
indepdendent ways that allow each to provide additional protection compared to the
other. The algorithms used for each of these three things are quite different.

As far as voting disks are concerned, a node must be able to access strictly more than half
of the voting disks at any time. So if you want to be able to tolerate a failure of n voting
disks, you must have at least 2n+1 configured. (n=1 means 3 voting disks). You can
configure up to 32 voting disks, providing protection against 15 simultaneous disk
failures, however it's unlikely that any customer would have enough disk systems with
statistically independent failure characteristics that such a configuration is meaningful. At
any rate, configuring multiple voting disks increases the system's tolerance of disk
failures (i.e. increases reliability).
Configuring a smaller number of voting disks on some kind of RAID system can allow a
customer to use some other means of reliability than the CSS's multiple voting disk
mechanisms. However there seem to be quite a few RAID systems that decide that 30-60
second (or 45 minutes in the case of veritas) IO latencies are acceptable. However we
have to wait for at least the longest IO latency before we can declare a node dead and
allow the database to reassign database blocks. So while using an independent RAID
system for the voting disk may appear appealing, sometimes there are failover latency
consequenecs.

What happens if I lose my voting disk(s)?

If you lose 1/2 or more of all of your voting disks, then nodes get evicted from the
cluster, or nodes kick themselves out of the cluster. It doesn't threaten database
corruption. For this reason we recommend that customers use an 3 or more voting disks
in 10g Release 2 (always in an odd number). Restoring corrupted voting disks is easy
since there isn't any significant persistent data stored in the voting disk. See the RAC
Admin and Deployment Guide for information on backup and restore of voting disks.

How can I register the listener with Oracle Clusterware in RAC 10g
Release 2?

NetCA is the only tool that configures listener and you should be always using it. It will
register the listener with Oracle Clusterware. There are no other supported alternatives.

Can the Network Interface Card (NIC) device names be different on the
nodes in a cluster, for both public and private?

The private NICs can be different accross nodes but public must be the same (ER
5439875 filed). If the private NIC names are different, you can either configure them
using oifcfg setif -node (rather than -global) for each node....in which case all RAC
instances on the node will use the specified one. Or if you want to use
CLUSTER_INTERCONNECTS init.ora parameter you set it for each instance to the IP
address(es) you want that instance to use.
Can I configure HP's Autoport aggregation for NIC Bonding after the
install? (i.e. not present beforehand)

You are able to add NIC bonding after the installation although this is more complicated
than the other way round.
There are several notes on webiv regarding this.
Note.271121.1 Ext/Pub How to change VIP and VIP/Hostname in 10g
Note.276434.1 Ext/Pub Modifying the VIP of a Cluster Node
Regarding the private interconnect, please use oifcfg delif / setif to modify this.
For customers on Linux, there is more information on NIC bonding, please read
Configure Redundant Network Cards / Switches for Oracle Database 10g Release 1 Real
Application Cluster on Linux

What are the IP requirements for the private interconnect?

The install guide will tell you the following requirements private IP address must satisfy
the following requirements:
1. Must be separate from the public network
2. Must be accessible on the same network interface on each node
3. Must have a unique address on each node
4. Must be specified in the /etc/hosts file on each node
The Best Pratices recommendation is to use the TCP/IP standard for non-routeable
networks. Reserved address ranges for private (non-routed) use (see TCP/IP RFC 1918):
* 10.0.0.0 -> 10.255.255.255
* 172.16.0.0 -> 172.31.255.255
* 192.168.0.0 -> 192.168.255.255
Cluvfy will give you an error if you do not have your private interconnect in the ranges
above.
You should not ignore this error. If you are using an IP address in the range used for the
public network for the private network interfaces, you are pretty much messing up the IP
addressing, and possibly the routing tables, for the rest of the corporation. IP addresses
are a sparse commodity, use them wisely. If you use them on a non-routable network,
there is nothing to prevent someone else to go and use them in the normal corporate
network, and then when those RAC nodes find out that there is another path to that
address range (through RIP), they just might start sending traffic to those other IP
addresses instead of the interconnect. This is just a bad idea.
When ct run the command 'onsctl start' receives the message "Unable to
open libhasgen10.so". Any idea why the message "unable to open
libhasgen10.so" ?

Most likely you are trying to start ONS from ORACLE_HOME instead of CRS_HOME.
Please try to start it from ORA_CRS_HOME.

Does Oracle Clusterware have to be the same or higher release than all
instances running on the cluster?

Yes - Oracle Clusterware must be the same or a higher release with regards to the
RDBMS or ASM Homes.
Please refer to Note#337737.1

Can I use Oracle Clusterware to monitor my EM Agent?

Check out Chapter 3 of the EM advanced configuration guide, specifically the section on
active passive configuration of agents. You should be able to model those to your
requirements. There is nothing special about the commands, but you do need to follow
the startup/shutdown sequence to avoid any discontinuity of monitoring. The agent does
start a watchdog that monitors the health of the actual monitoring process. This is done
automatically at agent start. Therefore you could use Oracle Clusterware but you should
not need to.

What is the voting disk used for?

A voting disk is a backup communications mechanism that allows CSS daemons to


negotiate which subcluster will survive. These voting disks keep a status of who is
currently alive and counts votes in case of a cluster reconfiguration. It works as follows:
a) Ensures that you cannot join the cluster if you cannot access the voting disk(s)
b) Leave the cluster if you cannot communicate with it (to ensure we do not have
abberant nodes)
c) Should multiple subclusters form, it will only allow one to continue. It prefers a greater
number of nodes, and secondly the node with the lowest incarnation number.
d) Is kept redundant by Oracle in 10gR2 (you need to access a majority of existing voting
disks)
Thus always at most only one subcluster will continue and a split brain will be avoided.

Why does Oracle Clusterware use an additional 'heartbeat' via the voting
disk, when other cluster software products do not?

Oracle uses this implementation because Oracle clusters always have access to a shared
disk environment. This is different from classical clustering which assumes shared
nothing architectures, and changes the decision of what strategies are optimal when
compared to other environments. Oracle also supports a wide variety of storage types,
instead of limiting it to a specific storage type (like SCSI), allowing the customer quite a
lot of flexibility in configuration.

Why does Oracle still use the voting disks when other cluster sofware is
present?

Voting disks are still used when 3rd party vendor clusterware is present, because vendor
clusterware is not able to monitor/detect all failures that matter to Oracle Clusterware and
the database. For example one known case is when the vendor clusterware is set to have
its heartbeat go over a different network than RAC traffic. Continuing to use the voting
disks allows CSS to resolve situations which would otherwise end up in cluster hangs.

How much I/O activity should the voting disk have?

Approximately 2 read + 1 write per second per node.

How do I identify the voting file location ?

Run the following command from /bin


"crsctl query css votedisk"
Can I use ASM to mirror Oracle data in an extended RAC environment?

This support is for 10gR2 onwards and has the following limitations:
1. As in any extended RAC environments, the additional latency induced by distance will
affect I/O and cache fusion performance. This effect will vary by distance and the
customer is responsible for ensuring that the impact attained in their environment is
acceptable for their application.
2. OCR must be mirrored across both sites using Oracle provided mechanisms.
3. Voting Disk redundancy must exists across both sites, and at a 3rd site to act as an
arbitrage. This third site may be via a WAN.
4. Storage at each site much be setup as seperate failure groups and use ASM mirroring,
to ensure at least one copy of the data at each site.
5. Customer must have a seperate and dedicated test cluster also in an extended
configuration setup using the same software and hardware components (can be fewer or
smaller nodes).
6. Customer must be aware that in 10gR2 ASM does not provide partial resilvering.
Should a loss of connectivity between the sites occur, one of the failure groups will be
marked invalid. When the site rejoins the cluster, the failure groups will need to be
manually dropped and added.

How should voting disks be implemented in an extended cluster


environment? Can I use standard NFS for the third site voting disk?

http://www.oracle.com/technology/products/database/clustering/pdf/thirdvoteonnfs.pdf
Standard NFS is only supported for the tie-breaking voting disk in an extended cluster
environment. See platform and mount option restrictions at:
http://www.oracle.com/technology/products/database/clustering/pdf/thirdvoteonnfs.pdf
Otherwise just as with database files, we only support voting files on certified NAS
devices, with the appropriate mount options. Pls refer to Metalink Note 359515.1 for a
full description of the required mount options. For a complete list of supported NAS
vendors refer to OTN at:
http://www.oracle.com/technology/deploy/availability/htdocs/vendors_nfs.html

Can I use ASM as mechanism to mirror the data in an Extended RAC


cluster?

Yes, but it cannot replicate everything that needs replication.


ASM works well to replicate any object you can put in ASM. But you cannot put the
OCR or Voting Disk in ASM.
In 10gR1 they can either be mirrored using a different mechanism (which could then be
used instead of ASM) or the OCR needs to be restored from backup and the Voting Disk
can be recreated.
In the future we are looking at providing Oracle redundancy for both.

Can a customer use SE RAC to implement an "Extended RAC Cluster" ?

No. When using SE RAC the nodes must be co-located in the same room. This is a
license restriction rather than a technical one.

What are the network requirements for an extended RAC cluster?

Necessary Connections
Interconnect, SAN, and IP Networking need to be kept on separate channels, each with
required redundancy. Redundant connections must not share the same Dark Fiber (if
used), switch, path, or even building entrances. Keep in mind that cables can be cut.
The SAN and Interconnect connections need to be on dedicated point-to-point
connections. No WAN or Shared connection allowed. Traditional cables are limited to
about 10 km if you are to avoid using repeaters. Dark Fiber networks allow the
communication to occur without repeaters. Since latency is limited, Dark Fiber networks
allow for a greater distance in separation between the nodes. The disadvantage of Dark
Fiber networks are they can cost hundreds of thousands of dollars, so generally they are
only an option if they already exist between the two sites.
If direct connections are used (for short distances) this is generally done by just stringing
long cables from a switch. If a DWDM or CWDM is used then then these are directly
connected via a dedicated switch on either side.
Note of caution: Do not do RAC Interconnect over a WAN. This is a the same as doing it
over the public network which is not supported and other uses of the network (i.e. large
FTPs) can cause performance degradations or even node evictions.
For SAN networks make sure you are using SAN buffer credits if the distance is over
10km.
At the moment in Oracle 10g, if Oracle Clusterware is being used, we also require that a
single subnet be setup for the public connections so we can fail over VIPs from one side
to another.

What is CVU? What are its objectives and features?


CVU brings ease to RAC users by verifying all the important components that need to be
verified at different stages in a RAC environment. The wide domain of deployment of
CVU ranges from initial hardware setup through fully operational cluster for RAC
deployment and covers all the intermediate stages of installation and configuration of
various components. The command line tool is cluvfy. Cluvfy is a non-intrusive utility
and will not adversely affect the system or operations stack.

What is a stage?

CVU supports the notion of Stage verification. It identifies all the important stages in
RAC deployment and provides each stage with its own entry and exit criteria. The entry
criteria for a stage define a specific set of verification tasks to be performed before
initiating that stage. This pre-check saves the user from entering into a stage unless its
pre-requisite conditions are met. The exit criteria for a stage define another specific set of
verification tasks to be performed after completion of the stage. The post-check ensures
that the activities for that stage have been completed successfully. It identifies any stage
specific problem before it propagates to subsequent stages; thus making it difficult to find
its root cause. An example of a stage is "pre-check of database installation", which
checks whether the system meets the criteria for RAC install.

What is a component?

CVU supports the notion of Component verification. The verifications in this category
are not associated with any specific stage. The user can verify the correctness of a
specific cluster component. A component can range from a basic one, like free disk space
to a complex one like CRS Stack. The integrity check for CRS stack will transparently
span over verification of multiple sub-components associated with CRS stack. This
encapsulation of a set of tasks within specific component verification should be of a great
ease to the user.

What is nodelist?

Nodelist is a comma separated list of hostnames without domain. Cluvfy will ignore any
domain while processing the nodelist. If duplicate entities after removing the domain
exist, cluvfy will eliminate the duplicate names while processing. Wherever supported,
you can use '-n all' to check on all the cluster nodes. Check this for more information on
nodelist and shortcuts.
Do I have to be root to use CVU?

No. CVU is intended for database and system administrators. CVU assumes the current
user as oracle user.

What about discovery? Does CVU discover installed components?

At present, CVU discovery is limited to these components. CVU discovers available


network interfaces if you do not specify any interface or IP address in its command line.
For storage related verification, CVU discovers all the supported storage types if you do
not specify a particular storage. CVU discovers CRS HOME if one is available.

How do I report a(or tons of) bug?

Please refer to the known issue/README files before filing a bug. If the issue is not
covered in those documents, file a bug against product# 5, component: OPSM and sub-
component: CLUVFY. Please provide the relevant log file while filing a bug.

What are the requirements for CVU?

CVU requires: 1._ An area with at least 30MB for containing software bits on the
invocation node. 2._ Java 1.4.1 location on the invocation node. 3._ A work directory
with at least 25MB on all the nodes. CVU will attempt to copy the necessary bits as
required to this location. Make sure, the location exists on all nodes and it has write
permission for CVU user. This dir is set through the CV_DESTLOC environment
variable. If this variable does not exist, CVU will use "/tmp" as the work dir. 4._ On
RedHat Linux 3.0, an optional package 'cvuqdisk' is required on all the nodes. This
assists CVU in finding scsi disks and helps CVU to perform storage checks on disks.
Please refer to What is 'cvuqdisk' rpm? for detail. Note that, this package should be
installed only on RedHat Linux 3.0 distribution.
How do I install 'cvuqdisk' package?

Here are the steps to install cvuqdisk package. 1._ Become root user 2._ Copy the rpm
( cvuqdisk-1.0.1-1.i386.rpm, current version is 1.0.1 ) to a local directory. You can find
the rpm in Oracle's OTN site. 3._ Set the environment variable to a group, who should
own this binary. Typically it is the "dba" group. export CVUQDISK_GRP=dba 4._ Erase
any existing package rpm -e cvuqdisk 5._ Install the rpm rpm -iv cvuqdisk-1.0.1-
1.i386.rpm

How do I know about cluvfy commands? The usage text of cluvfy does not
show individual commands.

Cluvfy has context sensitive help built into it. Cluvfy shows the most appropriate usage
text based on the cluvfy command line arguments. If you type 'cluvfy' on the command
prompt, cluvfy displays the high level generic usage text, which talks about valid stage
and component syntax. If you type 'cluvfy comp -list', cluvfy will show valid components
with brief description on each of them. If you type 'cluvfy comp -help', cluvfy will show
detail syntax for each of the valid components. Similarly, 'cluvfy stage -list' and 'cluvfy
stage -help' will list valid stages and their syntax respectively. If you type an invalid
command, cluvfy will show the appropriate usage for that particular command. For
example, if you type 'cluvfy stage -pre dbinst', cluvfy will show the syntax for pre-check
of dbinst stage.

What are the default values for the command line arguments?

Here are the default values and behavior for different stage and component commands:

For component nodecon:


If no -i or -a arguments is provided, then cluvfy will get into the discovery mode.

For component nodereach:


If no -srcnode is provided, then the local(node of invocation) will be used as the source
node.

For components cfs, ocr, crs, space, clumgr:


If no -n argument is provided, then the local node will be used.

For components sys and admprv:


If no -n argument is provided, then the local node will be used.
If no -osdba argument is provided, then 'dba' will be used. If no -orainv argument is
provided, then 'oinstall' will be used.

For component peer:


If no -osdba argument is provided, then 'dba' will be used.
If no -orainv argument is provided, then 'oinstall' will be used.

For stage -post hwos:


If no -s argument is provided, then cluvfy will get into the discovery mode.

For stage -pre clusvc:


If no -c argument is provided, then cluvfy will skip OCR related checks.
If no -q argument is provided, then cluvfy will skip voting disk related checks.
If no -osdba argument is provided, then 'dba' will be used.
If no -orainv argument is provided, then 'oinstall' will be used.

For stage -pre dbinst:


If -cfs_oh flag is not specified, then cluvfy will assume Oracle home is not on a shared
file system.
If no -osdba argument is provided, then 'dba' will be used.
If no -orainv argument is provided, then 'oinstall' will be used.

Do I have to type the nodelist every time for the CVU commands? Is there
any shortcut?

You do not have to type the nodelist every time for the CVU commands. Typing the
nodelist for a large cluster is painful and error prone. Here are few short cuts. To provide
all the nodes of the cluster, type '-n all'. Cluvfy will attempt to get the nodelist in the
following order: 1. If a vendor clusterware is available, it will pick all the configured
nodes from the vendor clusterware using lsnodes utility. 2. If CRS is installed, it will pick
all the configured nodes from Oracle clusterware using olsnodes utility. 3. In none of the
above, it will look for the CV_NODE_ALL environmental variable. If this variable is not
defined, it will complain. To provide a partial list(some of the nodes of the cluster) of
nodes, you can set an environmental variable and use it in the CVU command. For
example: setenv MYNODES node1,node3,node5 cluvfy comp nodecon -n $MYNODES

How do I get detail output of a check?

Cluvfy supports a verbose feature. By default, cluvfy reports in non-verbose mode and
just reports the summary of a test. To get detailed output of a check, use the flag '-
verbose' in the command line. This will produce detail output of individual checks and
where applicable will show per-node result in a tabular fashion.

How do I check network or node connectivity related issues?

Use component verifications commands like 'nodereach' or 'nodecon' for this purpose.
For detail syntax of these commands, type cluvfy comp -help on the command prompt. If
the 'cluvfy comp nodecon' command is invoked without -i, cluvfy will attempt to
discover all the available interfaces and the corresponding IP address & subnet. Then
cluvfy will try to verify the node connectivity per subnet. You can run this command in
verbose mode to find out the mappings between the interfaces, IP addresses and subnets.
You can check the connectivity among the nodes by specifying the interface name(s)
through -i argument.

Can I check if the storage is shared among the nodes?

Yes, you can use 'comp ssa' command to check the sharedness of the storage. Please refer
to the known issues section for the type of storage supported by cluvfy.

How do I check whether OCFS is properly configured?

You can use the component command 'cfs' to check this. Provide the OCFS file system
you want to check through the -f argument. Note that, the sharedness check for the file
sytem is supported for OCFS version 1.0.14 or higher.

How do I check the Oracle Clusterware stack and other sub-components


of it?

Cluvfy provides commands to check a particular sub-component of the CRS stack as well
as the whole CRS stack. You can use the 'comp ocr' command to check the integrity of
OCR. Similarly, you can use 'comp crs' and 'comp clumgr' commands to check integrity
of crs and clustermanager sub-components. To check the entire CRS stack, run the stage
command 'clucvy stage -post crsinst'.
How do I check user accounts and administrative permissions related
issues?

Use admprv component verification command. Refer to the usage text for detail
instruction and type of supported operations. To check whether the privilege is sufficient
for user equivalence, use '-o user_equiv' argument. Similarly, the '-o crs_inst' will verify
whether the user has the correct permissions for installing CRS. The '-o db_inst' will
check for permissions required for installing RAC and '-o db_config' will check for
permissions required for creating a RAC database or modifying a RAC database
configuration.

How do I check minimal system requirements on the nodes?

The component verification command sys is meant for that. To check the system
requirement for RAC, use '-p database' argument. To check the system requirement for
CRS, use '-p crs' argument.

Is there a way to compare nodes?

You can use the peer comparison feature of cluvfy for this purpose. The command 'comp
peer' will list the values of different nodes for several pre-selected properties. You can
use the peer command with -refnode argument to compare those properties of other nodes
against the reference node.

Why the peer comparison with -refnode says passed when the group or
user does not exist?

Peer comparison with the -refnode feature acts like a baseline feature. It compares the
system properties of other nodes against the reference node. If the value does not
match( not equal to reference node value ), then it flags that as a deviation from the
reference node. If a group or user does not exist on reference node as well as on the other
node, it will report this as 'matched' since there is no deviation from the reference node.
Similarly, it will report as 'mismatched' for a node with higher total memory than the
reference node for the above reason.

Is there a way to verify that the Oracle Clusterware is working properly


before proceeding with RAC install?

Yes. You can use the post-check command for cluster services setup(-post clusvc) to
verify CRS status. A more appropriate test would be to use the pre-check command for
database installation(-pre dbinst). This will check whether the current state of the system
is suitable for RAC install.

At what point cluvfy is usable? Can I use cluvfy before installing Oracle
Clusterware?

You can run cluvfy at any time, even before CRS installation. In fact, cluvfy is designed
to assist the user as soon as the hardware and OS is up. If you invoke a command which
requires CRS or RAC on local node, cluvfy will report an error if those required products
are not yet installed.

How do I turn on tracing?

Set the environmental variable SRVM_TRACE to true. For example, in tcsh "setenv
SRVM_TRACE true" will turn on tracing.

Where can I find the CVU trace files?

CVU log files can be found under $CV_HOME/cv/log directory. The log files are
automatically rotated and the latest log file has the name cvutrace.log.0. It is a good idea
to clean up unwanted log files or archive them to reclaim disk place. Note that, no trace
files will be generated if tracing has not been turned on.
Why cluvfy reports "unknown" on a particular node?

Cluvfy reports unknown when it can not conclude for sure if the check passed or failed.
A common cause of this type of reporting is a non-existent location set for the
CV_DESTLOC variable. Please make sure the directory pointed by this variable exists
on all nodes and is writable by the user.

What are the known issues with this release?

1._ Shared storage accessibility(ssa) check reports Current release of cluvfy has the
following limitations on Linux regarding shared storage accessibility check. a. Currently
NAS storage ( r/w, no attribute caching), OCFS( version 1.0.14 or higher ) and scsi
disks(if cvuqdisk package is installed) are supported. Note that, 'cvuqdisk' package
should be installed only on RedHat Linux 3.0 distribution. Discovery of scsi disks for
RedHat Linux 2.1 is not supported. b. For sharedness check on NAS, cluvfy requires the
user to have write permission on the specified path. If the cluvfy user does not have write
permission, cluvfy reports the path as not-shared. 2._ What database version is supported
by CVU? Current CVU release supports only 10g RAC and CRS and is not backward
compatible. In other words, CVU can not check or verify pre-10g products. 3._ What
Linux distributions are supported? This release supports only RedHat 3.0 Update 2 and
RedHat 2.1AS distributions. Note that, the CVU distribution for RedHat 3.0 Update 2
and RedHat 2.1AS are different; they are not binary compatible. In other words, CVU
bits for RedHat 3.0 and RedHat 2.1 are not the same. 4._ The component check for node
application (cluvfy comp nodeapp ...) command reports node app creation error if the
local CRS stack is down. This is a known issue and will be addressed shortly. 5._ CVU
does not recongnize the disk bindings ( e.g. /dev/raw/raw1 ) as valid storage paths or
identifiers. Please use the underlying disk( e.g. /dev/sdm etc ) for the storage path or
identifiers. 6._ Current version of CVU for RedHat 2.1 complains about the missing
cvuqdisk package. This will be corrected in the future release. User should ignore this
error. Note that, 'cvuqdisk' package should be installed only on RedHat Linux 3.0
distribution. Discovery of scsi disks for RedHat Linux 2.1 is not supported.

What is 'cvuqdisk' rpm? Why should I install this rpm?

CVU requires root privilege to gather information about the scsi disks during discovery.
A small binary uses the setuid mechanism to query disk information as root. Note that
this process is purely a read-only process with no adverse impact on the system. To make
this secured, this binary is packaged in the cvuqdisk rpm and need root privilege to install
on a machine. If this package is installed on all the nodes, CVU will be able to perform
discovery and shared storage accessibility checks for scsi disks. Otherwise, it complains
about the missing package 'cvuqdisk'. Note that, this package should be installed only on
RedHat Linux 3.0 distribution. Discovery of scsi disks for RedHat Linux 2.1 is not
supported.

When I run 10.2 CLUVFY on a system where RAC 10g Release 1 is


running I get following output:

Package existence check failed for "SUNWscucm:3.1".


Package existence check failed for "SUNWudlmr:3.1".
Package existence check failed for "SUNWudlm:3.1".
Package existence check failed for
"ORCLudlm:Dev_Release_06/11/04,_64bit_3.3.4.8_reentrant".
Package existence check failed for "SUNWscr:3.1".
Package existence check failed for "SUNWscu:3.1".
Checking this Solaris system I don't see those packages installed.
Can I continue my install?

Das könnte Ihnen auch gefallen