Beruflich Dokumente
Kultur Dokumente
Table of Contents
Table of Contents
Chapter 1: About this document ............................................................................... 6
Audience and purpose ...................................................................................................................... 7
Scope ................................................................................................................................................ 8
Business challenge ........................................................................................................................... 8
Technology solution .......................................................................................................................... 9
Objectives........................................................................................................................................ 10
Reference Architecture ................................................................................................................... 11
Validated environment profile ......................................................................................................... 13
Hardware and software resources .................................................................................................. 13
Unified storage platform environment ............................................................................................. 15
Prerequisites and supporting documentation.................................................................................. 17
Terminology .................................................................................................................................... 18
Table of Contents
Table of Contents
Test results ...................................................................................................................................... 78
DNFS configuration test results ...................................................................................................... 79
ASM configuration test results ........................................................................................................ 82
This Proven Solution Guide summarizes a series of best practices that EMC
discovered, validated, or otherwise encountered during the validation of a solution for
using an EMC Celerra NS-960 unified storage platform, EMC CLARiiON CX4-960
built-in as a back-end storage array, and Oracle Database 11g on Linux using Oracle
Direct Network File System (DNFS) or Oracle Automatic Storage Management
(ASM).
EMC's commitment to consistently maintain and improve quality is led by the Total
Customer Experience (TCE) program, which is driven by Six Sigma methodologies.
As a result, EMC has built Customer Integration Labs in its Global Solutions Centers
to reflect real-world deployments in which TCE use cases are developed and
executed. These use cases provide EMC with an insight into the challenges currently
facing its customers.
Use case
definition
A use case reflects a defined set of tests that validates the reference architecture for
a customer environment. This validated architecture can then be used as a reference
point for a Proven Solution.
Contents
Topic
See Page
Scope
Business challenge
Technology solution
Objectives
10
Reference Architecture
11
13
13
15
17
Terminology
18
Purpose
EMC partners
Customers
This purpose of this proven solution is to detail the use of two storage networking
technologies with Oracle: Oracle Automatic Storage Management (ASM) over Fibre
Channel (FC) and Oracle Direct NFS (DNFS) over Internet Protocol (IP). EMC field
personnel and account teams can use this as a guide for designing the storage layer
for Oracle environments. (Oracle DNFS is an implementation of Oracle where the
NFS client is embedded in the Oracle kernel. This makes the NFS implementation
OS agnostic, and it is specifically tuned for Oracle database workloads.)
The purpose of this Proven Solution Guide is to highlight the functionality,
performance, and scalability of DNFS and ASM in the context of an online
transaction processing (OLTP) workload. When testing both storage technologies,
the EMC Celerra NS-960 was used for storage.
In the case of ASM, the database servers were connected directly to the hostside FC ports on the CX4-960, which is the back end to the NS-960.
In the case of DNFS, the database servers were connected to the NS-960s
Data Movers using a 10 GbE storage network. Oracle RAC 11g on Linux for
x86-64 was used for the database environment.
Scope
Scope
The scope of this guide is limited to the performance, functionality, and scalability of
these two storage technologies in the context of Oracle RAC 11g for Linux on
x86-64. The cluster used for testing these use cases consisted of four nodes, each
containing 16 cores, with 128 GB of RAM. A 10 GbE storage network was used for
the IP storage use case, and FC was used for ASM.
A reasonable amount of storage and database tuning was performed in order to
achieve the results documented in this guide. However, undocumented parameters
were not used. The goal of this testing was to establish the real-world performance
that could be expected in a production customer environment. For this reason, the
storage was designed to support robustness and reliability, at the cost of
performance. This is consistent with the use of Oracle in a production, fault-tolerant
context.
Not in scope
Disaster recovery
Remote replication
Test/dev cloning
EMC has previously documented these use cases on Celerra and CLARiiON
platforms. Refer to the documents listed in the Prerequisites and supporting
documentation section.
Business challenge
Overview
Technology solution
Overview
Use a Celerra NS-960 and a high-speed 10 GbE network to chart the limits of
performance and user scalability in an Oracle RAC 11g DNFS OLTP
environment
Demonstrate that network-attached storage is competitive on cost and
performance as compared to a traditional storage infrastructure
Objectives
Objectives
Solution objectives
Objective
Performance
Details
Demonstrate the baseline performance of the Celerra
NS-960 running over NFS with Oracle RAC 11g R2
DNFS on a 10 GbE network.
Demonstrate the baseline performance of the Celerra
NS-960 running over FC with an Oracle RAC 11g ASM
environment.
Scale the workload and show the database performance
achievable on the array over NFS and over ASM.
Reference Architecture
Corresponding
Reference
Architecture
available on Powerlink and EMC.com. Refer to EMC Unified Storage for Oracle
Database 11g - Performance Enabled by EMC Celerra Using DNFS or ASM for
details.
If you do not have access to this content, contact your EMC representative.
Reference
Architecture
diagram for
DNFS
Figure 1.
Figure 1 depicts the solutions overall physical architecture for the DNFS over IP
implementation.
Reference
architecture
diagram for
ASM
Figure 2.
Figure 2 depicts the solutions overall physical architecture for the ASM over FC
implementation.
EMC used the environment profile defined in Table 2 to validate this solution.
Table 2.
Profile characteristics
Profile characteristic
Value
Database characteristic
OLTP
Benchmark profile
Response time
< 2 seconds
Read/write ratio
70/30
Database scale
Size of databases
1 TB
Number of databases
Solution hardware
Equipment
Quantity
Configuration
2 storage processors
3 Data Movers
1 Control Station
4 x 10 GbE network connections per Data
Mover
7 FC shelves
2 SATA shelves
105 x 300 GB 15k FC disks
30 x 1 TB SATA disks
(Brocade 8000)
FC switches
16 ports (4 Gb/s)
Equipment
Quantity
Database servers
Configuration
4 x 3 GHz Intel Nahalem quad-core
processors
128 GB of RAM
Software
Table 4 lists the software that EMC used to validate this solution.
Table 4.
Solution software
Software
Version
5.5
VMware vSphere
4.0
2003
11.2.0.1
5.8.1
5.6
6.29.5.0.37
EMC FLARE
EMC DART
6.29
This solution tested a unified storage platform with two different environments.
DNFS environment
With DNFS over IP, all database objects are accessed using the Celerra Data Mover
and accessible through an NFS mount. Datafiles, tempfiles, control files, online redo
logfiles, and archived log files are accessed using DNFS over the IP protocol.
ASM environment
With ASM over FC, all database objects, including datafiles, tempfiles, control files,
online redo logfiles, and archived log files, are stored on ASM disk groups that reside
on SAN storage.
Solution
environment
Storage layout
To test the unified storage platform solution with different protocols and different disk
drives, the database was built in two different configurations. The back-end storage
layout is the same for both except for the file-system type.
Table 5 shows the storage layouts for both environments.
Table 5.
Storage layouts
What
Where
FC disk
Voting disk
OCR files
Archived logfiles
SATA II
RAID-protected NFS file systems are designed to satisfy the I/O demands of
particular database objects. For example, RAID 5 is sometimes used for the
datafiles and tempfiles, but RAID 1 is always used for the online redo logfiles.
For more information, refer to: EMC Celerra NS-960 Specification Sheet
Network
architecture
Oracle datafiles and online redo logfiles reside on their own NFS file system.
Online redo logfiles are mirrored across two different file systems using
Oracle software multiplexing. Three NFS file systems are used - one file
system for datafiles and tempfiles, and two file systems for online redo
logfiles.
Oracle control files are mirrored across the online redo logfile NFS file
systems.
DNFS provides file system semantics for Oracle RAC 11g on NFS over IP
ASM provides file system semantics for Oracle RAC 11g on SAN over FC
The RAC interconnect and storage networks are 10 GbE. Jumbo frames are enabled
on these networks.
Supporting
documents
Third-party
documents
EMC CLARiiON
Oracle Database
EMC CLARiiON CX4 Model 960 (CX4-960) Storage System Setup Guide
EMC Backup and Recovery for Oracle Database 11g without Hot Backup
Mode using DNFS and Automatic Storage Management on Fibre ChannelA
Detailed Review
Oracle Grid Infrastructure Installation Guide 11g Release 2 (11.2) for Linux
Oracle Real Application Clusters Installation Guide 11g Release 2 (11.2) for
Linux and UNIX
Terminology
Terms and
definitions
Terminology
Term
Definition
Automatic Storage
Management (ASM)
Oracle ASM is a volume manager and a file system for Oracle Database
files. It supports single-instance Oracle Database and Oracle Real
Application Clusters (Oracle RAC) configurations. ASM uses block-level
storage.
A standard feature of all Linux and UNIX operating systems in which the
Network File Storage (NFS) client protocol is embedded in the operating
system kernel.
Online transaction
processing (OLTP)
Scale-up OLTP
Serial Advanced
Technology Attachment
(SATA) drive
The environment consists of a four-node Oracle RAC 11g cluster that accesses a
single production database. The four RAC nodes communicate with each other
through a dedicated private network that includes a Brocade 8000 FCoE switch. This
cluster interconnection synchronizes cache across various database instances
between user requests. An FC SAN is provided by two QLogic SANbox 5602
switches.
For the SAN configuration, EMC PowerPath is used in this solution and works with
the storage system to manage I/O paths. For each server, PowerPath manages four
active I/O paths to each device and four passive I/O paths to each device.
Contents
See Page
Concepts
20
Storage setup
20
Best practices
21
23
24
25
Concepts
High
availability and
failover
EMC Celerra has built-in high-availability (HA) features. These HA features allow the
Celerra to survive various failures without a loss of access to the Oracle database.
These HA features protect against the following:
FC switch failure
Disk failure
Storage setup
Setting up
CLARiiON (CX)
storage
Setting up NAS
storage
To set up CLARiiON (CX) storage, the steps in the following table must be carried
out:
Step
Action
Configure zoning.
To set up NAS storage, the steps in the following table must be carried out:
Step
Action
Best practices
Disk drives
SATA II drives
SATA II drives are frequently the best option for storing archived redo logs and the
fast recovery area. In the event of high-performance requirements for backup and
recovery, FC drives can also be used for this purpose.
Description
RAID 10/FC
RAID 5/FC
RAID 5/SATA II
Datafiles/tempfiles
Recommended
Recommended
Avoid
Control files
Recommended
Recommended
Avoid
Recommended
Avoid
Avoid
Archived logs
Possible (apply
1
tuning)
Possible (apply
1
tuning)
Recommended
OK
OK
Recommended
OK
OK
Avoid
The use of FC disks for archived logs is relatively rare. However, if many archived
logs are being created, and the I/O requirements for archived logs exceed a
reasonable number of SATA II disks, this may be a more cost-effective solution.
Tempfiles,
undo, and
sequential table
or index scans
In some cases, if an application creates a large amount of temp activity, placing your
tempfiles on RAID 10 devices may be faster due to RAID 10s superior sequential
I/O performance. This is also true for undo. Further, an application that performs
many full table scans or index scans may benefit from these datafiles being placed
on separate RAID 10 devices.
Online redo
logfiles
Online redo log files should be put on RAID 1 or RAID 10 devices. You should not
use RAID 5 because sequential write performance of distributed parity (RAID 5) is
not as high as that of mirroring (RAID 1).
RAID 1 or RAID 10 provides the best data protection; protection of online redo log
files is critical for Oracle recoverability.
You should use FC disks for OCR files and voting disk files; unavailability of these
files for any significant period of time (due to disk I/O performance issues) may
cause one or more of the RAC nodes to reboot and fence itself off from the cluster.
The LUN/RAID group layout images in the RAID group layout section, show two
different storage configurations that can be used for Oracle RAC 11g databases on a
Celerra. That section can help you to determine the best configuration to meet your
performance needs.
Shelf
configuration
The most common error when planning storage is designing for storage capacity
rather than for performance. The single most important storage parameter for
performance is disk latency. High disk latency is synonymous with slower
performance; low disk counts lead to increased disk latency.
The default stripe size for all file systems on FC shelves (redo logs and data) should
be 32 KB. Similarly, the recommended stripe size for the file systems on SATA II
shelves (archive and flash) should be 256 KB.
EMC recommends that you turn off file-system read prefetching for an OLTP
workload. Leave it on for a Decision Support System (DSS) workload.
Prefetch will waste I/Os in an OLTP environment, since few, if any, sequential I/Os
are performed. In a DSS, setting the opposite is true.
To turn off the read prefetch mechanism for a file system, type:
$ server_mount <movername> -option <options>,noprefetch
<fs_name> <mount_point>
For example:
$ server_mount server_3 option rw,noprefetch ufs1 /ufs1
NFS thread
count
EMC recommends that you use the default NFS thread count of 256 for optimal
performance.
Do not set this to a value lower than 32 or to a value higher than 512.
For more information about these parameter, see the Celerra Network Server
Parameters Guide on Powerlink. If you do not have access to this content, contact
your EMC representative.
file.asyncthres
hold
EMC recommends that you use the default value of 32 for the parameter
file.asyncthreshold. This provides optimum performance for databases.
For more information about these parameter, see the Celerra Network Server
Parameters Guide on Powerlink. If you do not have access to this content, contact
your EMC representative.
The Data Mover failover capability is a key feature unique to the Celerra. This
feature offers redundancy at the file-server level, allowing continuous data access. It
also helps to build a fault-resilient RAC architecture.
Configuring
failover
EMC recommends that you set up an auto-policy for the Data Mover, so that if a
Data Mover fails, either due to hardware or software failure, the Control Station
immediately fails the Data Mover over to its partner. The standby Data Mover
assumes the faulted Data Movers identities:
Network identity: IP and MAC addresses of all its network interface cards
(NICs)
Service identity: Shares and exports controlled by the faulted Data Mover
This ensures continuous file sharing transparently for the database without requiring
users to unmount and remount the file system. The NFS applications and NFS
clients do not see any significant interruption in I/O.
Pre-conditions
for failover
Power failure within the Data Mover (unlikely, as the Data Mover is typically
wired into the same power supply as the entire array)
Events that do
not cause
failover
Manual failover
Because manual rebooting of Data Mover does not initiate a failover, EMC
recommends that you initiate a manual failover before taking down a Data Mover for
maintenance.
Two sets of RAID and disk configurations have been tested. These are described in
the following sections.
For this solution, the FC disks are designed to hold all other database files, for
example, datafiles, control files, online redo log files. As per EMC best practices,
online redo log files are put on RAID 10, while datafiles and control files are put on
RAID 5.
The SATA disks are only designed to hold the archive logs and backup files.
Therefore, there is no impact on the performance and scalability testing performed
by EMC.
For a customers production environment, EMC recommends RAID 6 instead of
RAID 5 for archive logs and backup files when using SATA. RAID 6 provides extra
redundancy; its performance is almost the same as RAID 5 for read, but is slower for
write.
For more information about RAID 6, see EMC CLARiiON RAID 6 TechnologyA
Detailed Review.
RAID group
layout for ASM
Figure 3.
The RAID group layout for seven-FC shelf RAID 5/RAID 1 and two-SATA II RAID 5
using user-defined storage pools is shown in Figure 3.
RAID group
layout for NFS
Figure 4.
The RAID group layout for seven-FC shelf RAID 5/RAID 1 and two-SATA II RAID 5
using user-defined storage pools is shown in Figure 4.
For NFS, in addition to the configuration on the CLARiiON, a few configuration steps
are necessary for the Celerra, which provides the file system path to the hosts.
Seven file systems need to be created to hold database files, online redo logs,
archive logs, backup files, and CRS files. For database files, two file systems are
created to use two Data Movers for better performance. For online redo log files, two
file systems are created to use two Data Movers for multiplexing and better
performance.
For ASM, no additional configuration on Celerra is required.
There are many different ways of organizing software and services in the file system
that all make sense. However, standardization on a workable single layout across all
services has more advantages than picking a layout that is well suited for a particular
application.
Contents
See Page
Limitations
29
30
Limitations
Limitations
with Automatic
Volume
Management
The file systems shown in Table 8 were created on MVM user-defined pools,
exported on the Celerra, and mounted on the database servers.
ASM does not utilize a file system.
Table 8.
Volumes
/crs
log1pool
(user-defined storage pool created
using log1stripe volume)
/crsfs
/datafs1
datapool
(user-defined storage pool created
using datastripe volume)
data1stripe
(metavolume consisting of all
available FC 4+1 RAID 5
groups)
/datafs2
datapool
(user-defined storage pool created
using datastripe volume)
data2stripe
(metavolume consisting of all
available FC 4+1 RAID 5
groups)
/log1fs
log1pool
(user-defined storage pool created
using log1stripe volume)
log1stripe
(metavolume using one
RAID 10 group)
/log2fs
log2pool
(user-defined storage pool created
using log2stripe volume)
log2stripe
(metavolume using one
RAID 10 group)
/archfs
archpool
(user-defined storage pool created
using archstripe volume)
archstripe
(metavolume using the SATA
1
6+1 RAID 5 group)
/frafs
frapool
(user-defined storage pool created
using frastripe volume)
frastripe
(metavolume using the SATA
1
6+1 RAID 5 group)
EMC strongly recommends using RAID 6 with high-capacity SATA drives. High capacity is 1 TB or greater in
capacity.
Contents
This chapter provides guidelines on the Oracle 11g RAC database design used for
this validated solution. The design and configuration instructions apply to the specific
revision levels of components used during the development of the solution.
Before attempting to implement any real-world solution based on this validated
scenario, you must gather the appropriate configuration documentation for the
revision levels of the hardware and software components. Version-specific release
notes are especially important.
See Page
Considerations
32
33
Oracle ASM
34
35
38
HugePages
39
Considerations
Heartbeat
mechanisms
The network heartbeat across the RAC interconnects that establishes and
confirms valid node membership in the cluster
Both of these heartbeat mechanisms have an associated time-out value. For more
information on Oracle Clusterware misscount and disktimeout parameters, see
Oracle MetaLink Note 294430.1.
EMC recommends setting the disk heartbeat parameter disktimeout to
160 seconds. You should leave the network heartbeat parameter
misscount at the default of 60 seconds.
Rationale
These settings will ensure that the RAC nodes do not evict when the active Data
Mover fails over to its partner.
The command to configure this option is:
$ORA_CRS_HOME/bin/crsctl set css disktimeout 160
Note
In Oracle RAC 11g R2, the default value of disktimeout has been set to
200. Therefore, there is no need to manually change the value to 160. You
must check the current value of the parameter before you make any
changes by executing the following command:
$GRID_HOME/bin/crsctl get css disktimeout
Oracle Cluster
Ready Services
Oracle Cluster Ready Services (CRS) are enabled on each of the Oracle RAC 11g
servers. The servers operate in active/active mode to provide local protection against
a server failure and to provide load balancing.
Provided that the required mount-point parameters are used, CRS-required files
(including the voting disk and the OCR file) can reside on NFS volumes.
For more information on the mount-point parameters required for the Oracle
Clusterware files, see Chapter 6: Installation and Configuration > Task 4: Configure
NFS client options.
NFS client
In the case of the Oracle RAC 11g database, the embedded Oracle DNFS protocol
is used to connect to the Celerra storage array. DNFS runs over TCP/IP.
Oracle binary
files
The Oracle RAC 11g binary files, including the Oracle CRS, are installed on the
database servers' local disks.
Datafiles, online redo log files, archive log files, tempfiles, control files, and CRS files
reside on Celerra NFS file systems. These file systems are designed (in terms of the
RAID level and number of disks used) to be appropriate for each type of file.
Table 9 lists each file or activity type and indicates where it resides.
Table 9.
Content
Location
Database file
layout for ASM
Datafiles, tempfiles
/archfs
/frafs
/crs
Datafiles, online redo log files, archive log files, tempfiles, control files, and CRS files
reside on CLARiiON storage that is managed by Oracle ASM. The database was
built with six distinct ASM disk groups: +DATA, +LOG1, +LOG2, +ARCH, +FRA, and
+CRS.
Table 10 lists each file or activity type and indicates where it resides.
Table 10. Location of files and activities for ASM
Content
Location
Datafiles, tempfiles
+DATA
+ARCH
+FRA
+CRS
Oracle ASM
ASMLib
Oracle has developed a storage management interface called the ASMLib API.
ASMLib is not required to run ASM; it is an add-on module that simplifies the
management and discovery of ASM disks. The ASMLib provides an alternative, to
the standard operating system interface, for ASM to identify and access block
devices.
The ASMLib API provides two major feature enhancements over standard interfaces:
Best practices
for ASM and
database
deployment
Implement multiple access paths to the storage array using two or more HBAs
or initiators.
Deploy multipathing software over these multiple HBAs to provide I/O loadbalancing and failover capabilities.
Use disk groups with similarly sized and performing disks. A disk group
containing a large number of disks provides a wide distribution of data
extents, thus allowing greater concurrency for I/O, and reduces the
occurrence of hotspots. Since a large disk group can easily sustain various
I/O characteristics and workloads, a single (database area) disk group can be
used to house database files, logfiles, and controlfiles.
Use disk groups with four or more disks and ensure these disks span several
back-end disk adapters.
For example, a common deployment can be four or more disks in a database disk
group (for example, DATA disk group) spanning all back-end disk adapters/directors,
and eight to ten disks for the FRA disk group. The size of the FRA area will depend
on what is stored and how much, that is, full database backups, incremental
backups, flashback database logs, and archive logs.
Note
An active copy of the controlfile and one member of each of the redo log
groups are stored in the FRA.
Oracle 11g includes a feature for storing Oracle datafiles on a NAS device, referred
to as Direct NFS or DNFS. DNFS integrates the NFS client directly inside the
database kernel instead of the operating system kernel.
As part of this solution, the storage elements for Oracle RAC 11g were accessed
using the DNFS protocol. It is relatively easy to configure DNFS. It applies only to the
storage of Oracle datafiles. Redo log files, tempfiles, control files, and so on are not
affected. You can attempt to configure the mount points where these files are stored
to support DNFS, but this will have no impact.
DNFS provides performance advantages over conventional Linux kernel NFS (or
KNFS) because fewer context switches are required to perform an I/O. Because
DNFS integrates the client NFS protocol into the Oracle kernel, this allows all I/O
calls to be made in user space, rather than requiring a context switch to kernel
space. As a result, CPU utilization associated with the I/O of the database server is
reduced.
Disadvantages of KNFS
I/O caching and performance characteristics vary between operating systems. This
leads to varying NFS performance across different operating systems (for example,
Linux and Solaris), and across different releases of the same operating system (for
example, RHEL 4.7 and RHEL 5.4). This in turn results in varying NFS performance
across implementations.
Using the DNFS configuration, the Oracle RAC 11g and EMC Celerra solution
enables you to deploy an EMC NAS architecture with DNFS connectivity for its
Oracle RAC 11g database applications that have lower cost and reduced
complexity than direct-attached storage (DAS) or storage area network (SAN).
Figure 5 illustrates how DNFS can be used to deploy an Oracle 11g and
EMC Celerra solution.
Figure 5.
DNFS
performance
advantage
Enhanced data
integrity
Advantage
Details
Consistent performance
Overcomes OS write
locking
Included in 11g
To ensure database integrity, immediate writes must be made to the database when
requested. Operating system caching delays writes for efficiency reasons; this
potentially compromises data integrity during failure scenarios.
DNFS uses database caching techniques and asynchronous direct I/O to ensure
almost immediate data writes, thus reducing data integrity risks.
Load balancing
and high
availability
Load balancing and high availability (HA) are managed internally within the DNFS
client itself, rather than at the OS level. This greatly simplifies network setups in HA
environments and reduces dependence on IT network administrators by eliminating
the need to set up network subnets and bond ports, for example, LACP bonding.
DNFS allows multiple parallel network paths/ports to be used for I/O between the
database server and the IP storage array. For each node, two paths were used in
the testing performed for this solution. For efficiency and performance, these paths
are managed and load balanced by the DNFS client, not by the operating system.
The four paths should be configured in separate subnets for effective load balancing
by DNFS.
Less tuning
required
Oracle 11g DNFS requires little additional tuning, other than the tuning
considerations necessary in any IP storage environment with Oracle. In an
unchanging environment, once tuned, DNFS requires no ongoing maintenance.
The database may not open at all; if it does open, you may experience errors
due to lack of shared pool space.
In an OLTP context, the size of the shared pool is frequently the limitation on
performance of the database.
For more information, see Effects of Automatic Memory Management on
performance.
Automatic
Memory
Management
MEMORY_TARGET
MEMORY_MAX_TARGET
Once these parameters are set, Oracle 11g can, in theory, handle all memory
management issues, including both SGA and PGA memory. However, the Automatic
Memory Management model in Oracle 11g 64 bit (Release 1) requires configuration
of shared memory as a file system mounted under /dev/shm. This adds an additional
management burden to the DBA/system administrator.
Effects of
Automatic
Memory
Management
on performance
HugePages
HugePages
The Linux 2.6 kernel includes a feature called HugePages. This feature allows you to
specify the number of physically contiguous large memory pages that will be
allocated and pinned in RAM for shared memory segments like the Oracle System
Global Area (SGA).
The pre-allocated memory pages can only be used for shared memory and must be
large enough to accommodate the entire SGA. HugePages can create a very
significant performance improvement for Oracle RAC 11g database servers. The
performance payoff for enabling HugePages is significant.
Warning
HugePages must be tuned carefully and set correctly. Unused HugePages can only
be used for shared memory allocations - even if the system runs out of memory and
starts swapping. Incorrectly configured HugePages settings may result in poor
performance and may even make the machine unusable.
HugePages
parameters
The HugePages parameters are stored in /etc/sysctl.conf. You can change the value
of HugePages parameters by editing the systctl.conf file and rebooting the instance.
Table 12 describes the HugePages parameters.
Table 12. HugePages parameters
Parameter
Description
HugePages_Total
HugePages_Free
HugePagesSize
Contents
This chapter focuses on the network design and layout for this solution. It includes
the technology details of SAN and IP network configuration as well as the RAC
interconnect network. To maximize the network performance, jumbo frames have
been enabled on different layers.
See Page
Concepts
42
42
43
IP network layout
43
Virtual LANs
44
Jumbo frames
45
46
47
Concepts
Jumbo frames
Maximum Transfer Unit (MTU) sizes of greater than 1,500 bytes are referred to as
jumbo frames.
Jumbo frames require Gigabit Ethernet across the entire network infrastructure
server, switches, and database servers.
VLAN
Virtual local area networks (VLANs) logically group devices that are on different
network segments or sub-networks.
EMC recommends that you use Gigabit Ethernet for the RAC interconnects if RAC is
used. If 10 GbE is available, that is better.
Jumbo frames
and the RAC
interconnect
For Oracle RAC 11g installations, jumbo frames are recommended for the private
RAC interconnect. This boosts the throughput as well as possibly lowers the CPU
utilization due to the software overhead of the bonding devices. Jumbo frames
increase the device MTU size to a larger value (typically 9,000 bytes).
VLANs
EMC recommends that you use VLANs to segment different types of traffic to
specific subnets. This provides better throughput, manageability, application
separation, high availability, and security.
Zoning
Two Brocade 8000 switches are used for the test bed.
Two connections from each database server are connected to the Brocade
8000 switches.
One FC port from SPA and SPB is connected to each of the two FC switches
at 4 Gb/s.
Each FC port from the database servers are zoned to both SP ports. According to
EMCs best practices, single initiator zoning was used, meaning one HBA/one SP
port per zone.
IP network layout
Network design
for the
validated
scenario
Client virtual machines run on a VMware ESX server. They are connected
to a client network.
Jumbo frames are enabled on the RAC interconnect and storage networks.
The Oracle RAC 11g servers are connected to the client, RAC interconnect,
WAN, and production storage networks.
Virtual LANs
Virtual LANs
This solution uses four VLANs to segregate network traffic of different types. This
improves throughput, manageability, application separation, high availability, and
security.
Table 13 describes the database server network port setup.
Table 13. Database server network port setup
Client VLAN
VLAN ID
Description
CRS setting
Client network
Public
RAC interconnect
Private
Storage
Private
Storage
Private
The client VLAN supports connectivity between the physically booted Oracle
RAC 11g servers, the virtualized Oracle Database 11g, and the client workstations.
The client VLAN also supports connectivity between the Celerra and the client
workstations to provide network file services to the clients. Control and management
of these devices are also provided through the client network.
RAC
interconnect
VLAN
The RAC interconnect VLAN supports connectivity between the Oracle RAC 11g
servers for network I/O required by Oracle CRS. One NIC is configured on each
Oracle RAC 11g server to the RAC interconnect network.
Storage VLAN
The storage VLAN uses the NFS protocol to provide connectivity between servers
and storage. Each database server connected to the storage VLAN has two NICs
dedicated to the storage VLAN. Link aggregation is configured on the servers to
provide load balancing and port failover between the two ports.
For validating DNFS, link aggregation is removed. DNFS was validated using one-,
two-, three-, and four-port configurations. Link aggregation is not required on DNFS
because Oracle 11g internally manages load balancing and high availability.
Redundant
switches
Jumbo frames
Introduction to
jumbo frames
Switch
Note
Celerra Data
Mover
Configuration steps for the switch are not covered here, as that is vendorspecific. Check your switch documentation for details.
To configure jumbo frames on the Data Mover, execute the following command on
the Control Station:
server_ifconfig server_2 int1 mtu=9000
Where:
Linux servers
RAC
interconnect
Jumbo frames should be configured for the storage and RAC interconnect networks
of this solution to boost the throughput, as well as possibly lowering the CPU
utilization due to the software overhead of the bonding devices.
Typical Oracle database environments transfer data in 8 KB and 32 KB block sizes,
which require multiple 1,500 frames per database I/O, while using an MTU size of
1,500. Using jumbo frames, the number of frames needed for every large I/O request
can be reduced, thus the host CPU needed to generate a large number of interrupts
for each application I/O is reduced. The benefit of jumbo frames is primarily a
complex function of the workload I/O sizes, network utilization, and Oracle database
server CPU utilization, and so is not easy to predict.
For information on using jumbo frames with the RAC interconnect, see Oracle
MetaLink Note 300388.1.
To test whether jumbo frames are enabled, use the following command:
ping M do s 8192 <target>
Where:
Jumbo frames must be enabled on all layers of the network for this command to
succeed.
The private interconnect should only be used by Oracle to transfer cluster manager
and cache fusion related data.
Although it is possible to use the public network for the RAC interconnect, this is not
recommended as it may cause degraded database performance (reducing the
amount of bandwidth for cache fusion and cluster manager traffic).
Configuring
virtual IP
addresses
The virtual IP addresses must be defined in either the /etc/hosts file or DNS for all
RAC nodes and client nodes. The public virtual IP addresses will be configured
automatically by Oracle when the Oracle Universal Installer is run, which starts
Oracle's Virtual Internet Protocol Configuration Assistant (vipca).
All virtual IP addresses will be activated when the following command is run:
srvctl start nodeapps -n <node_name>
Where:
Table 14 lists each interface and describes its use for the Oracle 11g DNFS
configuration.
Table 14. Interfaces for DNFS configuration
Interface port ID
Description
eth0
Client network
eth1
RAC interconnect
eth6
Storage network
eth7
Storage network
Table 15 lists each interface and describes its use for the Oracle 11g ASM
configuration.
Table 15. Interfaces for ASM configuration
Interface port ID
Description
eth0
Client network
eth1
RAC interconnect
eth6
Unused
eth7
Unused
This chapter provides procedures and guidelines for installing and configuring the
components that make up the validated solution scenario. The installation and
configuration instructions presented in this chapter apply to the specific revision
levels of components used during the development of this solution.
Before attempting to implement any real-world solution based on this validated
scenario, gather the appropriate installation and configuration documentation for the
revision levels of the hardware and software components planned in the solution.
Version-specific release notes are especially important.
Note
Contents
Where tasks are not divided into NAS or ASM, they are the same for both
configurations.
See Page
49
53
53
56
57
59
59
61
63
65
68
70
71
EMC PowerPath provides I/O multipath functionality. With PowerPath, a node can
access the same SAN volume via multiple paths (HBA ports), which enables both
load balancing across the multiple paths and transparent failover between the paths.
Install EMC
PowerPath
Installation
The installation is very straightforward. In the solution environment, EMC runs the
following command on the four nodes:
rpm -i EMCpower.LINUX-5.3.1.00.00-111.rhel5.x86_64.rpm
PowerPath license
To register the PowerPath license, run the following command:
emcpreg -install
Type the 24-character alphanumeric sequence found on the License Key Card
delivered with the PowerPath media kit.
Licensing type
To set the licensing type, choose one of the following:
Load-balancing policies
To set the PowerPath load-balancing policies to CLARiiON Optimize, run the
following command:
powermt set policy=co
New device/path
To reconfigure the new device/path, run the following command:
Powermt config
Use this command when scanning for new devices. Add those new devices to the
PowerPath configuration, configure all detected paths to PowerPath devices, then
add paths to the existing devices.
For more information on prerequisites and installing PowerPath, see the EMC
PowerPath for Linux Installation and Administration Guide.
Configure EMC
PowerPath
After installation, you should be able to see pseudo devices using this command:
powermt display dev=all
To start and stop PowerPath, run the command as below:
/etc/init.d/PowerPath start
/etc/init.d/PowerPath stop
All ASM disk groups are then built using PowerPath pseudo names.
Note
Because of the way in which the SAN devices are discovered on each node, there is
a possibility that a pseudo device pointing to a specific LUN on one node might point
to a different LUN on another node. The emcpadm command is used to ensure
consistent naming of PowerPath devices on all nodes as shown in Figure 6.
Figure 6.
Using the
emcpadm
utility
Enhancements to the emcpadm utility allow you to preserve and restore PowerPath
pseudo-device-to-array logical-unit bindings. The new commands simplify the
process of renaming pseudo devices in a cluster environment. For example, you can
rename pseudo devices on one cluster node, export the new device mappings, then
import the mappings on another cluster node.
EMC Unified Storage for Oracle Database 11g - Performance
Enabled by EMC Celerra Using DNFS or ASM Proven Solution Guide (Draft)
50
Command
Description
emcpadm import_mappings
[-v] -f <pseudo device/LU
mappings file>
Argument
Description
-f <pseudo device/LU
mappings file>
-v
Note
On NodeB of the cluster, copy the NodeA.map file and compare it with the current
configuration:
emcpadm check_mappings -v -f NodeA.map
This shows a comparison of the two configurations and what changes will be made if
this mapping file is imported. To proceed, run the following on NodeB:
emcpadm import_mappings -v -f NodeA.map
powermt save
For details on configuring NAS and managing Celerra, see Supporting Information>
Managing and monitoring EMC Celerra.
For details on configuring ASM and managing the CLARiiON, follow the steps in the
table below.
Step
1
Action
Find the operating system (OS) version.
[root@fj903-esx01 ~]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.5
(Tikanga)
[root@fj903-esx01 ~]#
extended
p
Partition number (1-4): 1
First sector (63-2950472703, default 63):
2048
oracleasmlib-2.0.4-1.el5
5
6
7
[root@fj903-esx01 ~]#
Configure the ASM.
[root@fj903-esx01 ~]# /etc/init.d/oracleasm
configure
[root@fj903-esx01 ~]#
/etc/init.d/oracleasm createdisk
DISK_LOG1 /dev/emcpowers1
Scan the ASM disk on each node.
[root@fj903-esx01 ~]# /etc/init.d/oracleasm
scandisks
Table 18 shows the PowerPath names associated with the LUNs used in the ASM
disk groups.
Table 18. ASM disk names
Path
CLARiiON LUN
CRS
DISK_CRS
/dev/emcpowerx
DATA
DISK_DATA_1
/dev/emcpowerr
11
DISK_DATA_2
/dev/emcpowero
14
DISK_DATA_3
/dev/emcpowerj
20
DISK_DATA_4
/dev/emcpowerk
19
DISK_DATA_5
/dev/emcpowerh
22
DISK_DATA_6
/dev/emcpowerf
24
DISK_DATA_7
/dev/emcpowerw
DISK_DATA_8
/dev/emcpowerq
12
DISK_DATA_9
/dev/emcpowern
15
DISK_DATA_10
/dev/emcpowera
DISK_DATA_11
/dev/emcpowerl
18
DISK_DATA_12
/dev/emcpowerm
17
DISK_DATA_13
/dev/emcpowerp
13
DISK_DATA_14
/dev/emcpowerv
10
DISK_DATA_15
/dev/emcpowerg
23
DISK_DATA_16
/dev/emcpoweri
21
LOG1
DISK_LOG1
/dev/emcpowers
LOG2
DISK_LOG2
/dev/emcpoweru
27
FRA
DISK_FRA_1
/dev/emcpowerb
29
DISK_FRA_2
/dev/emcpowerc
28
DISK_ARCH_1
/dev/emcpowerd
26
DISK_ARCH_2
/dev/emcpowere
25
ARCH
Fujitsu PRIMERGY RX600 S4 servers were used in our testing. These servers were
preconfigured with the BIOS version 5.00 Rev. 1.19.2244.
Regardless of the server vendor and architecture, you should monitor the BIOS
version shipped with the system and determine if it is the latest production version
supported by the vendor. If it is not the latest production version supported by the
vendor, then flashing the BIOS is recommended.
Disable HyperThreading
In the kernel
For optimal reliability and performance, EMC recommends the NFS client options listed
in Table 19. The mount options are listed in the /etc/fstab file.
Table 19.
Option
Syntax
Recommended
Description
Hard mount
hard
Always
NFS protocol
version
vers= 3
Always
TCP
proto=tcp
Always
Background
bg
Always
No interrupt
nointr
Always
rsize=32768
Always
No auto
wsize=32768
noauto
Actimeo
actimeo=0
RAC only
Timeout
timeo=600
Always
Mount options
for Oracle RAC
files
Table 20 shows mount options for Oracle RAC files when used with NAS devices.
Table 20. Mount options for Oracle RAC files
Operating
system
Mount options
for Binaries
Linux x86-64
rw,bg,hard,nointr,r
size=32768,wsize
=32768,tcp,vers=3
,timeo=600,
actimeo=0
rw,bg,hard,nointr,rsiz
e=32768,wsize=3276
8,tcp,actimeo=0,vers
=3,timeo=600
rw,bg,hard,nointr,rsi
ze=32768,wsize=32
768,tcp,vers=3,time
o=600,actimeo=0
sunrpc.tcp_slot
_table_entries
No protocol
overhead
For information about installing the Oracle grid infrastructure for Linux, see Oracle
Grid Infrastructure Installation Guide 11g Release 2 (11.2) for Linux.
Install Oracle
RAC for Linux
For information about installing Oracle RAC for Linux, see Oracle Real Application
Clusters Installation Guide 11g Release 2 (11.2) for Linux and UNIX.
Memory
configuration
files
Table 21 describes the files that must be configured for memory management.
Table 21. Memory configuration files
File
Created by
Function
/etc/sysctl.conf
Linux installer
/etc/security/limits.conf
Linux installer
Oracle installer,
dbca, or DBA
who creates the
database
Kernel parameter
Parameter function
Recommended value
kernel.shmmax
4096
kernel.shmall
Configuring
/etc/security/
limits.conf
Action
Ensure that the machine you are using has adequate memory.
For example, the EMC test system had 128 GB of RAM and a 100 GB
SGA.
1857
Hugepagesize:
2048 KB
HugePages_Rsvd:
5
10834
Enable
HugePages for
RAC 11.2.0.1 on
Linux
With Grid Infrastructure 11.2.0.1, if the database is restarted by using the srvctl
command, HugePages may not be used on Linux, unless sql*plus is used to restart
each instance. This issue has been fixed in 11.2.0.2, while in 11.2.0.1, the following
is the workaround:
Modify /etc/init.d/ohasd (could be /etc/ohasd or
/sbin/init.d/ohasd depend on platform):
Replace the following:
start()
{
$ECHO -n $"Starting $PROG: "
With
start()
{
$ECHO -n $"Starting $PROG: "
ulimit -n 65536
ulimit -l 104857600
And restart the node.
If you have a bigger SGA, you need adjust the "ulimit -l" parameter accordingly. In
the workaround above, it is set to 100 GB.
More
information
about
HugePages
Tuning and Optimizing Red Hat Enterprise Linux for Oracle 9i and 10g
Databases
This task describes the initialization parameters that should be set in order to
configure the Oracle instance for optimal performance on the CLARiiON CX4 series.
These parameters are stored in the spfile or init.ora file for the Oracle instance.
Database block
size
Table 23 shows the database block size parameter for this configuration.
Table 23. Database block size parameter
Parameter
Syntax
DB_BLOCK_SIZE=n
Description
Direct I/O
Parameter
Direct I/O
Syntax
FILESYSTEM_IO_OPTIONS=setall
Description
This setting enables direct I/O and async I/O. Direct I/O is a
feature available in modern file systems that delivers data
directly to the application without caching in the file system
buffer cache. Direct I/O preserves file system semantics and
reduces the CPU overhead by decreasing the kernel code
path execution. I/O requests are directly passed to the
network stack, bypassing some code layers.
Direct I/O is a beneficial feature to Oracles log writer, both in
terms of throughput and latency. Async I/O is beneficial for
datafile I/O.
Multiple
database writer
processes
Table 25 shows the multiple database writer processes parameter for this
configuration.
Table 25. Multiple database writer processes parameter
Parameter
Syntax
DB_WRITER_PROCESSES=2*n
Description
Multiblock
read count
Table 26 shows the multiblock read count parameter for this configuration.
Table 26. Multiblock read count parameter
Parameter
Syntax
DB_FILE_MULTIBLOCK_READ_COUNT=n
Description
Table 27 shows the disk async I/O parameter for this configuration.
Table 27. Disk async I/O parameter
Parameter
Syntax
DISK_ASYNCH_IO=true
Description
RHEL 4 update 3 and later support async I/O with direct I/O
on NFS. Async I/O is now recommended on all the storage
protocols.
The default value is true in Oracle 11.2.0.1.
When you use DNFS, you must create a new configuration file, oranfstab, to specify
the options/attributes/parameters that enable Oracle Database to use DNFS. The
oranfstab file must be placed in the ORACLE_BASE\ORACLE_HOME\dbs directory.
When oranfstab is placed in the ORACLE_BASE\ORACLE_HOME\dbs directory, the
entries in this file are specific to a single database. The DNFS client searches for the
mount point entries as they appear in oranfstab. DNFS uses the first matched entry
as the mount point.
Use the steps in the following table to configure the oranfstab file:
Step
1
Action
Create a file called oranfstab at the location:
$ORACLE_HOME/dbs/
[oracle@ fj903-esx01 ~]$ cat
/u01/app/oracle/product/11.2.0/dbhome_1/dbs/oranfstab
server: 192.168.4.160
local: 192.168.4.10
path: 192.168.4.160
export: /datafs1 mount: /u04
export: /log1fs mount: /u05
export: /flashfs mount: /u08
export: /data_efd1 mount: /u10
server: 192.168.5.160
local: 192.168.5.10
path: 192.168.5.160
export: /log2fs mount: /u06
export: /datafs2 mount: /u09
export: /data_efd2 mount: /u11
[oracle@fj903-esx01 ~]$
Apply ODM
NFS library
To enable DNFS, Oracle database uses an ODM library called libnfsodm11.so. You
must replace the standard ODM library libodm11.so, with the ODM NFS library
libnfsodm11.so.
Use the steps in the following table to replace the standard ODM library with the
ODM NFS library:
Step
Action
$ ln s libnfsodm11.so libodm11.so
Enable
transChecksum
on the Celerra
Data Mover
EMC recommends that you enable transChecksum on the Data Mover that serves
the Oracle DNFS clients. This avoids the likelihood of TCP port and XID (transaction
identifier) reuse by two or more databases running on the same physical server,
which could possibly cause data corruption.
To enable the transChecksum, type:
#server_param <movername> -facility nfs -modify transChecksum
-value 1
Note
DNFS network
setup
This applies to NFS version 3 only. Refer to the NAS Support Matrix
available on Powerlink to understand the Celerra version that supports this
parameter.
Port bonding and load balancing are managed by the Oracle DNFS client in
the database, therefore, there are no additional network setup steps.
Dontroute specifies that outgoing messages should not be routed using the
operating system but sent using the IP address to which they are bound.
If dontroute is not specified, it is mandatory that all paths to the Celerra are
configured in separate network subnets.
The network setup can now be managed by an Oracle DBA, through the oranfstab
file. This frees up the database sysdba from specific bonding tasks previously
necessary for OS LACP-type bonding, for example, the creation of separate subnets.
If you use DNFS, you must create a new configuration file, oranfstab, to specify the
options/attributes/parameters that enable Oracle Database to use DNFS.
The steps include:
For Oracle RAC, replicate the oranfstab file on all nodes and keep them
synchronized
This task contains a number of queries that you can run to verify that DNFS is
enabled for the database.
Check the
available DNFS
storage paths
To check the available DNFS storage paths, run the following query:
SQL> select unique path from v$DNFS_channels; PATH
-------------------------------------------------------------192.168.4.160
192.168.5.160
To check the data files configured under DNFS, run the following query:
SQL> select FILENAME from V_$DNFS_FILES; FILENAME
------------------------------------------------------------/u04/oradata/mterac28/control01.ctl
/u04/oradata/mterac28/control02.ctl
/u09/oradata/mterac28/tpcc002.dbf
/u09/oradata/mterac28/tpcc004.dbf
/u09/oradata/mterac28/tpcc006.dbf
/u09/oradata/mterac28/tpcc008.dbf
/u09/oradata/mterac28/tpcc010.dbf
/u09/oradata/mterac28/tpcc012.dbf
/u09/oradata/mterac28/tpcc014.dbf
...
/u09/oradata/mterac28/undotbs2_8.dbf
/u09/oradata/mterac28/undotbs3_6.dbf
/u09/oradata/mterac28/undotbs3_8.dbf
/u09/oradata/mterac28/undotbs4_6.dbf
Check the
server and the
directories
configured
under DNFS
To check the server and the directories configured under DNFS, run the following
query:
SQL> select inst_id,svrname,dirname mntport,nfsport from
gv$DNFS_servers;
The query output is below.
INST_ID
SVRNAME
MNTPORT
NFSPORT
192.168.4.160
/datafs1
2049
192.168.5.160
/datafs2
2049
192.168.4.160
/log1fs
2049
192.168.5.160
/log2fs
2049
192.168.4.160
/datafs1
2049
192.168.5.160
/datafs2
2049
192.168.5.160
/log2fs
2049
192.168.4.160
/log1fs
2049
192.168.4.160
/datafs1
2049
192.168.5.160
/datafs2
2049
192.168.5.160
/log2fs
2049
192.168.4.160
/log1fs
2049
192.168.4.160
/datafs1
2049
192.168.5.160
/datafs2
2049
192.168.4.160
/log1fs
2049
192.168.5.160
/log2fs
2049
16 rows selected.
EMC recommends that when you create the control file, allow for growth by setting
MAXINSTANCES, MAXDATAFILES, MAXLOGFILES, and MAXLOGMEMBERS to
high values.
Your database should have a minimum of two control files located on separate
physical ASM disk groups or NFS file systems. One way to multiplex your control
files is to store a control file copy on every location that stores members of the redo
log groups.
Online and
archived redo
log files
To understand how redo log and archive log files can be placed, refer to the
Reference Architecture diagrams (Figure 1 and Figure 2).
ssh files
Passwordless authentication using ssh relies on the files described in Table 28.
Table 28. SSH files
Id_dsa.pub
File
Created by
Purpose
~/.ssh/id_dsa.pub
ssh-keygen
~/.ssh/authorized_keys
ssh
~/.ssh/known_hosts
ssh
Enabling
authentication:
single user/
single host
Use the steps in the following table to enable passwordless authentication using ssh
for a single user on a single host.
Step
Action
Copy the key for the host for which authorization is being given to the
authorized_keys file of the host that allows the login.
Complete a login so that ssh knows about the host that is logging in.
That is, record the hosts key and hostname in the known_hosts file.
Enabling
authentication: Prerequisites
single user/
To enable authentication for a user on multiple hosts, you must first enable
multiple hosts authentication for the user on a single host (see Chapter 6: Installation and
Configuration > Task 12: Enable passwordless authentication using ssh > Enabling
authentication: single user/single host).
Procedure summary
After you have enabled authentication for a user on a single host, you can then
enable authentication for the user on multiple hosts by copying the authorized_keys
and known_hosts files to the other hosts.
Before Oracle RAC 11g R2, this was a very common task when setting up Oracle
RAC prior to installation of Oracle Clusterware. It is possible to automate this task by
using the ssh_multi_handler.bash script. Below is a sample script; it needs to be
modified to adapt to the real environment.
ssh_multi_handler.bash
#!/bin/bash
#-----------------------------------------------------------#
# Script:
ssh_multi_handler.bash
# Purpose:
#-----------------------------------------------------------#
ALL_HOSTS="rtpsol347 rtpsol348 rtpsol349
rtpsol350" THE_USER=root
mv -f ~/.ssh/authorized_keys ~/.ssh/authorized_keys.bak
mv -f ~/.ssh/known_hosts ~/.ssh/known_hosts.bak
for i in ${ALL_HOSTS}
do
ssh ${THE_USER}@${i} "ssh-keygen -t dsa"
ssh ${THE_USER}@${i} "cat ~/.ssh/id_dsa.pub" \
Use the steps in the following table to automate the task of enabling authentication
for the user on multiple hosts.
At the end of the procedure, all of the equivalent users on the set of hosts will be
able to log in to all of the other hosts without issuing a password.
Step
Action
Copy and paste the text from ssh_multi_handler.bash into a new file on
the Linux server.
Enabling
Another common task is to set up passwordless authentication across two
authentication: users between two hosts.
single host/
For example, enable the Oracle user on the database server to run commands
different user
as the root or nasadmin user on the Celerra Control Station.
You can set this up by using the ssh_single_handler.bash script. This script creates
passwordless authentication from the presently logged-in user to the root user on the
Celerra Control Station.
ssh_single_handler.bash
#!/bin/bash
#-----------------------------------------------------------#
# Script:
ssh_single_handler.bash
# Purpose:
#-----------------------------------------------------------#
THE_USER=root
THE_HOST=rtpsol33
ssh-keygen -t dsa
KEY=`cat ~/.ssh/id_dsa.pub`
ssh ${THE_USER}@${THE_HOST} "echo ${KEY} >> \
~/.ssh/authorized_keys"
ssh ${THE_USER}@${THE_HOST} date
exit
Note
Contents
See Page
Testing tools
76
Test procedure
77
Test results
78
79
82
Testing tools
Overview of
testing tools
Data population
tool
Benchmark Factory has its own data population function. The data load function will
be created automatically when creating a scenario.
As shown in Figure 7, the scale factor determines the amount of information initially
loaded into the benchmark tables. For the TPC-C benchmark, each scale factor
represents one warehouse as per TPC-C specification.
The TPC-C benchmark involves a mix of five concurrent transactions of different
types and complexity. The database is composed of nine tables with a wide range of
records. You can alter the scale factor to meet the database size requirement before
running the data load scenario.
Figure 7.
There was one load agent on the host, allowing data to be loaded in parallel (it
creates nine database sessions to fill out the nine TPC-C tables) until the 1 TB goal
was reached.
With benchmark scale 11,500, the data population completed in around 40 hours,
including index creation, and so on.
Benchmark
Factory
console and
agents
All tests were run from a Benchmark Factory console on a Microsoft Windows 2003
server with 18 agents started on 6 Microsoft Windows 2003 servers.
Test procedure
Test procedure
EMC used the steps in the following table to validate the performance test.
Step
Action
Test results
Overview of
test results
The test results for the four-node Oracle RAC 11g DNFS and ASM configurations
are detailed below.
The overall SGA size of the database servers was set to around 100 GB. To
maximize and utilize the memory resources on each server, the memory
management method of SGA is manually set by indicating buffer cache, shared pool,
and large pool individually.
Table 29 provides a summary of the cache size settings.
Table 29.
I#
Memory
target
SGA
target
DB
cache
Shared
pool
Large
pool
Java
pool
Streams
pool
PGA
target
Log
buffer
90,112
9,216
2,048
1,536
55.47
90,112
9,216
2,048
1,536
55.47
90,112
9,216
2,048
1,536
55.47
90,112
9,216
2,048
1,536
55.47
Besides the manual SGA memory configuration, HugePages were enabled on each
server in consideration of memory performance improvement. Oracle DNFS was
also enabled in consideration of I/O performance. Jumbo frames were enabled on
the server, switch, and Celerra for RAC interconnect and IP storage network in
consideration of network throughput improvement.
The same test procedure was performed on ASM and DNFS individually. The
database statistics, OS statistics, network statistics, and storage-related statistics
were captured for tuning purposes as well as for performance-comparison purposes.
Performance
testing
Scalability
testing
For scalability testing, EMC ran the performance tests and added user load until the
performance degraded or the connection could no longer be served by the database.
The DNFS configuration used two ports, which connected to two Data Movers
individually for the four-node RAC.
Table 30 shows the test results.
Table 30. DNFS configuration test results
Users
TPS
Response
time
DB CPU
busy
average
Physical
reads
Physical
writes
Redo
size (K)
11,200
2,068.67
0.568
64.68
2,156,916
588,131
2,851,585
Figure 8.
Virtual
station ID
TPS
Executions
Rows
Bytes
Avg
response
time
113.33
20416
27707
8873618
0.575
115.24
20751
27893
8904559
0.558
115.4
20792
28479
8846424
0.574
116.38
20961
28146
8805911
0.573
115.92
20884
28320
8848351
0.569
115.25
20732
27888
8829376
0.575
115.9
20903
28576
8897491
0.575
114.85
20698
28003
8860614
0.563
113.56
20454
26909
8831482
0.571
10
113.84
20512
27193
8842175
0.581
11
114.94
20694
28216
8861084
0.571
12
114.96
20705
28257
8889954
0.555
13
113.78
20503
28414
8790546
0.578
14
114.21
20558
27321
8876515
0.560
15
115.5
20814
27994
8897519
0.554
16
116.63
21027
29640
8926348
0.551
17
114.01
20548
26845
8858376
0.578
18
114.97
20723
27573
8877471
0.576
Total
Total
Total
Total
Average
2068.7
372675
503374
159517814
0.569
Name
Executions
Rows
Bytes
169300
167609
102576708
0.648
Payment Transaction
160062
160062
49759654
0.26
Order-Status Transaction
13900
13900
6534240
1.023
Delivery Transaction
14710
147100
588400
1.205
Stock-Level Transaction
14703
14703
58812
1.936
Table 33 shows the OS statistics for all database instances. For DNFS, the CPU wait
for I/O operations (% WIO) is almost zero, which is very low compared to the ASM
configuration.
Table 33.
I#
Load
Begin
Load
End
% Busy
% Usr
% Sys
% WIO
% Idle
Busy
Time (s)
Idle
Time (s)
2.46
16.75
66.91
54.09
8.61
0.04
33.09
3,513.32
1,737.31
2.64
15.93
63.18
50.7
8.31
0.06
36.82
3,318.60
1,934.25
2.82
12.1
64.5
51.8
8.57
0.05
35.5
3,408.60
1,875.76
2.09
15.12
64.12
51.73
8.48
0.04
35.88
3,373.77
1,887.74
Table 34 shows the I/O statistics for all database files including data files, temp
files, control files, and redo logs.
Table 34.
Reads MB/s
Writes MB/s
Reads requests/s
Writes requests/s
I#
Total
Data
File
Total
Data
File
Log
File
Total
Data
File
Total
Data
File
Log
File
13.45
12.55
6.03
3.73
2.28
1,660.78
1,603.10
550.88
369.02
180.82
13.83
12.93
5.47
3.32
2.14
1,712.61
1,655.03
529.38
346.8
181.66
14.13
13.22
5.91
3.65
2.24
1,749.66
1,691.62
545.32
357.35
186.96
13.55
12.66
5.65
3.35
2.28
1,681.68
1,624.07
538.7
347.44
190.27
Sum
54.97
51.36
23.06
14.06
8.94
6,804.71
6,573.82
2,164.28
1,420.62
739.71
Avg
13.74
12.84
5.76
3.51
2.24
1,701.18
1,643.46
541.07
355.15
184.93
The ASM configuration used one two-port HBA card, which connected to the
CLARiiON CX4-960 back-end storage individually for the four-node RAC.
Table 35 shows the test results.
Table 35. ASM configuration test results
Users
TPS
Response
time
DB CPU
busy
average
Physical
reads
Physical
writes
Redo
size (K)
11,200
2,106.41
0.488
65.16
2,065,183
1,357,951
3,028,227
Figure 9.
Virtual
station ID
TPS
Executions
Rows
Bytes
Avg response
time
115.22
20730
28139
8998621
0.497
117.16
21098
28471
9059748
0.489
118.44
21312
29322
9068017
0.49
118.29
21297
28515
8974151
0.504
117.65
21160
28648
8966422
0.495
116.97
21061
28485
8983977
0.493
118.24
21253
28980
9026747
0.498
117.5
21157
28718
9040102
0.484
116.08
20875
27401
9023315
0.481
10
116.19
20879
27871
8986565
0.494
11
116.61
20942
28581
8957636
0.498
12
116.51
20955
28606
8992325
0.48
13
116.37
20915
28992
8970585
0.492
14
116.77
20987
27877
9059288
0.476
15
117.22
21082
28243
9016271
0.475
16
118.18
21247
30034
9012472
0.478
17
115.52
20811
26730
9004728
0.49
18
117.5
21095
27940
9026151
0.485
Total
Total
Total
Total
Average
2106.42
378856
511553
162167121
0.489
Name
Executions
Rows
Bytes
171885
170203
104164236
0.544
Payment Transaction
162896
162896
50699967
0.222
Order-Status Transaction
14138
14138
6645654
0.93
Delivery Transaction
14931
149310
597240
1.064
Stock-Level Transaction
15006
15006
60024
1.751
Table 38 shows the OS statistics for all database instances. For ASM, the CPU wait
for I/O operations (%WIO) is around 5 percent, which is higher than the value
achieved with the DNFS configuration.
Table 38.
I#
Load
Begin
Load
End
% Busy
% Usr
% Sys
% WIO
% Idle
Busy
Time (s)
Idle Time
(s)
3.21
16.44
68.3
54.84
9.08
4.15
31.7
3,714.70
1,724.39
3.76
13.85
62.67
49.94
8.66
6.16
37.33
3,420.06
2,037.20
3.57
12.76
63.99
51.19
8.75
6.07
36.01
3,504.39
1,972.04
4.03
17.01
65.67
52.79
8.87
5.32
34.33
3,584.07
1,873.38
Table 39 shows the I/O statistics for all database files including data files, temp files,
control files, and redo logs.
Table 39.
Reads MB/s
Writes MB/s
Reads requests/s
Writes requests/s
I#
Total
Data
File
Total
Data
File
Log
File
Total
Data
File
Total
Data
File
Log
File
13.15
12.28
10.29
7.88
2.39
1,623.91
1,568.08
1,177.84
858.99
317.79
12.62
11.75
9.81
7.58
2.21
1,557.58
1,501.48
1,163.28
845.46
316.85
12.32
11.45
10.13
7.81
2.31
1,520.45
1,464.38
1,202.67
868
333.75
12.9
12.01
10.27
7.88
2.37
1,590.98
1,534.21
1,188.55
858.18
329.45
Sum
50.99
47.48
40.49
31.15
9.28
6,292.91
6,068.15
4,732.34
3,430.63
1,297.84
Avg
12.75
11.87
10.12
7.79
2.32
1,573.23
1,517.04
1,183.08
857.66
324.46
Conclusion of
test results
Both DNFS and ASM tests demonstrate expected peak TPS statistics, with only small
variations in performance across the two environments, as shown in Table 40.
The workload run was an OLTP benchmark with a read/write mix of 3:1 (75 percent
random reads and 25 percent random writes) using small I/Os.
TPS is transactions per second of the OLTP workload. This is an accurate
measurement of the database processing capabilities of the testbed, including the
storage layer. IOPS was not included in the table because NFS IOPS and SAN IOPS
are not comparable.
Table 40. Summary of performance results
Peak
TPS
CPU
Busy (%)
CPU
WIO (%)
DB file sequential
read wait (ms)
DNFS
2,069
64.68
0.05
7.97
ASM
2,106
65.16
5.43
7.85
Figure 10 shows the peak TPS rates and DB file sequential read latency for the
DNFS and ASM configurations. The DB file sequential read latency for ASM is 1.5
percent faster than DNFS, while the peak TPS is very close for the two different
configurations.
Figure 10.
Figure 11 shows the CPU utilization for both the DFNS and ASM configurations.
Figure 11.
CPU utilization
Figure 12 shows the CPU utilization rate at the peak TPS on the first node. The
vmstat utility, available on all UNIX and Linux systems, gives an overall indication of
CPU utilization.
This chart is generated based on the output of vmstat and shows that DNFS has
more CPU idle time than ASM during the peak TPS phase of TPC-C testing. To
some extent, this result demonstrates that using an NFS server for file management
reduces CPU utilization on the production database server by taking advantage of
DNFS.
Figure 12.
CPU statistics
Table 41 details the observations for the CPU statistics shown in Figure 12.
Table 41. CPU statistics
CPU time
category
Description
Observation
User Time
System
Time
Idle Time
I/O Wait
Time
Figure 13 is generated from the CPU statistics shown in Figure 12. The id column
was converted into % Utilization by subtracting the idle percent value from 100. The
middle horizontal dotted line shows that the average utilization for ASM is about 75
percent, while average utilization for DNFS is about 70 percent. The utilization line is
nearly perfectly straight, without variance, at 100 percent during this sampling period
for ASM, while for DNFS, there are still some buffers.
Figure 13.
At the point of CPU saturation, processes begin to wait for CPU, in the run queue as
shown in Figure 14. Correspondingly, the process number in the run queue for ASM
peaks at 170, while for DNFS, the maximum number is 129.
Figure 14.
To explore the advantages in CPU utilization of DNFS further, the statistics captured
at user load 8,000 show that TPS is almost the same for both configurations. With
the same workload, the statistical differences in CPU utilization are clearer as shown
in Table 42.
Table 42. Summary of performance results at the same workload
User Load
Peak TPS
DNFS
8,000
1,669.27
49.91
0.06
ASM
8,000
1,669.99
50.62
10.94
Figure 15 shows the CPU utilization rate with the same workload on the first node.
Figure 15.
Figure 16 is generated from the CPU statistics shown in Figure 15. The id column
was converted into % Utilization by subtracting the idle percent value from 100. The
middle horizontal dotted line shows that the average utilization for ASM is about 66
percent, while the average utilization for DNFS is about 53 percent. The utilization
line is nearly perfectly straight, without variance, at 100 percent during this sampling
period for ASM, while for DNFS, there are still some buffers.
Figure 16.
At the point of CPU saturation, processes begin to wait for CPU, in the run queue as
shown in Figure 17. Correspondingly, the process number in the run queue for ASM
peaked at 65, while for DNFS, the maximum number is 40.
Figure 17.
Conclusion for
test results
The test results demonstrate that network-attached storage using either a DNFS or
an ASM configuration is competitive on cost and performance compared to a
traditional storage infrastructure under a certain user load.
Chapter 8: Conclusion
Chapter 8: Conclusion
Overview
Introduction
The EMC Celerra unified storage platforms high-availability features combined with
EMCs proven storage technologies provide a very attractive storage system for the
Oracle RAC 11g over DNFS or ASM.
Conclusion
Objective
Results
The solution enables the Oracle RAC 11g configuration by providing shared
disks.
The overall Celerra architecture and its connectivity to the back-end storage
make it highly scalable, with the ease of increasing capacity by simply adding
components for immediate usability.
Running the Oracle RAC 11g with Celerra provides the best availability, scalability,
manageability, and performance for your database applications.
Oracle 11g environments using DNFS or ASM provide options to customers
depending on their familiarity and expertise with a chosen protocol, existing
EMC Unified Storage for Oracle Database 11g - Performance
Enabled by EMC Celerra Using DNFS or ASM Proven Solution Guide (Draft)
92
Chapter 8: Conclusion
architecture, and budgetary constraints. Testing proves that both implementations
have similar performance profiles, so it is the customer's responsibility to choose the
protocol and architecture that best fit their specific needs.
EMC unified storage provides flexibility and manageability for a storage infrastructure
that supports either of these architectures. Unified storage can also offer hybrid
architectures that utilize both protocols in a single solution, for example, production
could implement ASM over FC SAN, while test/dev could be developed over IP with
DNFS.
Next steps
Supporting Information
Supporting Information
Overview
Introduction
The Celerra Manager is a web-based graphical user interface (GUI) for remote
administration of a Celerra unified storage platform. Various tools within Celerra
Manager provide the ability to monitor the Celerra. These tools are available to
highlight potential problems that have occurred or could occur in the future. Some of
these tools are delivered with the basic version of Celerra Manager, while more
detailed monitoring capabilities are delivered in the advanced version.
Celerra Manager can be used to create Ethernet channels, link aggregations, and
fail-safe networks
Celerra Data
Mover ports
Figure 18.
The Celerra Data Mover provides 10 GbE and 1 GbE storage network ports. The
number and type of ports vary significantly across Celerra models. Figure 18 shows
the back side of Data Mover.
Supporting Information
Enterprise Grid
Control storage
monitoring
plug-in
EMC recommends use of the Oracle Enterprise Manager monitoring plug-in for the
EMC Celerra unified storage platform. This system monitoring plug-in enables you
to:
Realize lower costs through knowledge: know what you have and what has
changed
For more information on the plug-in for the EMC Celerra server, see:
Oracle Enterprise Manager 11g System Monitoring Plug-In for EMC Celerra Server
Figure 19 shows the EMC Celerra OEM plug-in.
Figure 19.