Sie sind auf Seite 1von 191

Authors:

Satya THIPPANA
ENGINEERED SYSTEMS OVERVIEW
WHAT IS ENGINEERED SYSTEMS:
With an inclusive "in-a-box" strategy, Oracle's engineered systems combine best-of-breed hardware and
software components with game-changing technical innovations. Designed, engineered, and tested to
work together, Oracle's engineered systems cut IT complexity, reduce costly maintenance, integration
and testing tasks—and power real-time decision making with extreme performance.

WHY ENGINEERED SYSTEMS:

In the era of big data and cloud services, data increasingly drives business innovation. But for many
companies, IT complexity is a serious roadblock, impeding decision maker’s ability to get the information
they need, when they need it. Not only does it slow corporate agility, but IT complexity stifles
innovation, as companies are forced to invest in ‘keeping the lights on’ instead of transformative digital
projects.

The components of Oracle's engineered systems are preassembled for targeted functionality and then
optimized—as a complete system—for speed and simplicity. That means everything you're doing gets
kicked up without adding complexity to IT environments.

 Oracle Exadata Database Machine


 Oracle Exalogic Elastic Cloud
 Oracle Exalytics In-Memory Machine
 Oracle Super Cluster
 Oracle Private Cloud Appliance
 Oracle Database Appliance
 Oracle Big Data Appliance
 Zero Data Loss Recovery Appliance
 Oracle FS1 Flash Storage System
 Oracle ZFS Storage Appliance

THE EXA FAMILY

 Oracle Exadata Database Machine


 Oracle Exalogic Elastic Cloud
 Oracle Exalytics In-Memory Machine

2
EXADATA DATABASE MACHINE INTRODUCTION & OVERVIEW
EXADATA OVERVIEW

 Fully integrated platform for Oracle Database


 Based on Exadata Storage Server storage technology
 High-performance and high-availability for all Oracle Database workloads
 Well suited as a database consolidation platform
 Simple and fast to implement (Pre-configured, Tested and Tuned for Performance)

WHY EXADATA?
Exadata Database Machine is designed to address common issues:

 Data Warehousing issues:


 Supports large, complex queries
 Managing Multi-Terabyte databases
 OLTP issues:
 Supporting large user populations and transaction volumes
 Delivering quick and consistent response times
 Consolidation issues:
 Efficiently supporting mixed workloads(OLTP & OLAP)
 Prioritizing workloads(IORM)
 Configuration Issues:
 Creating a balanced configuration without bottlenecks
 Building and maintaining a robust system that works
 Redundant and Fault Tolerant:
 Failure of any component is tolerated.
 Data is mirrored across storage servers.

COMPONENTS OF EXADATA DATABASE MACHIENE:

COMPONENT FULL RACK 1/2 RACK 1/4R RACK 1/8th RACK


Database Servers 8 4 2 2
Storage Servers 14 7 3 3
Infini Band Switches 2 2 2 2
PDUs 2 2 2 2
Cisco Admin Switch 1 1 1 1

3
4
5
Database Machine is a fully-integrated Oracle Database platform based on Exadata Storage Server
storage technology. Database Machine provides a high-performance, highly-available solution for all
database workloads, ranging from scan-intensive data warehouse applications to highly concurrent
OLTP applications.

Using the unique clustering and workload management capabilities of Oracle Database, Database
Machine is well-suited for consolidating multiple databases onto a single Database Machine. Delivered
as a complete package of software, servers, and storage, Database Machine is simple and fast to
implement.

Database Machine hardware and software components are a series of separately purchasable items.
Customers can choose from the different hardware configurations that are available. Appropriate
licensing of Oracle Database and Exadata cell software is also required. In addition, Database Machine is
highly complementary with clustering and parallel operations, so Oracle Real Application Clusters and
Oracle Partitioning are highly recommended software options for Database Machine.

X4-2 DATABASE SERVER OVERVIEW:

X4-2 STORAGE SERVER OVERVIEW:

6
EXADATA EVOLUTION:

EXADATA FEATURES BY VERSIONS:

EXADATA SCALABILITY:

7
 Scale to eight racks by adding cables
 Scale from 9 to 36 racks by adding 2 InfiniBand switches
 Scale to hundreds of storage servers to support multi petabyte databases
 EXADATA Storage Expansion Racks available in QTR, HALF and FULL Rack configuration.

EXADATA X4-2 CONFIGURATION WORK SHEET

8
EXADATA IO PERFORMANCE WORK SHEET:

EXADATA DATABASE MACHINE SOFTWARE ARCHITECTURE:

9
DATABASE SERVER COMPONENTS:

 Operating System (Oracle Linux x86_64 or Solaris 11 Express for x86)


 Run Oracle Database 11g Release 2.
 Automatic Storage Management (ASM)

 LIBCELL ($ORACLE_HOME/lib/libcell11.so) is a special library used to communicate Oracle Database


with Exadata cells. In combination with the database kernel and ASM, LIBCELL transparently maps
database I/O operations to Exadata Storage Server enhanced operations. LIBCELL communicates
with Exadata cells using the Intelligent Database protocol (iDB). iDB is a unique Oracle data transfer
protocol, built on Reliable Datagram Sockets (RDS), that runs on industry standard InfiniBand
networking hardware.
LIBCELL and iDB enable ASM and database instances to utilize Exadata Storage Server features, such
as Smart Scan and I/O Resource Management.

 Database Resource Manager (DBRM) is integrated with Exadata Storage Server I/O Resource
Management (IORM). DBRM and IORM work together to ensure that I/O resources are allocated
based on administrator-defined priorities.

STORAGE SERVER COMPONENTS:

 Cell Server (CELLSRV) is the primary Exadata Storage Server software component and provides the
majority of Exadata storage services. CELLSRV is a multithreaded server. CELLSRV serves simple
block requests, such as database buffer cache reads, and Smart Scan requests, such as table scans
with projections and filters. CELLSRV also implements IORM, which works in conjunction with DBRM
to meter out I/O bandwidth to the various databases and consumer groups issuing I/Os. Finally,
CELLSRV collects numerous statistics relating to its operations. Oracle Database and ASM processes
use LIBCELL to communicate with CELLSRV, and LIBCELL converts I/O requests into messages that
are sent to CELLSRV using the iDB protocol.

 Management Server (MS) provides Exadata cell management and configuration functions. It works
in cooperation with the Exadata cell command-line interface (CellCLI). Each cell is individually
managed with CellCLI. CellCLI can only be used from within a cell to manage that cell; however you
can run the same CellCLI command remotely on multiple cells with the dcli utility. In addition, MS is
responsible for sending alerts and collects some statistics in addition to those collected by CELLSRV.

 Restart Server (RS) is used to start up/shut down the CELLSRV and MS services and monitors these
services to automatically restart them if required.

10
DISKMON checks the storage network interface state and cell aliveness, it also performs DBRM plan
propagation to Exadata cells.
Diskmon uses a node wide master process (diskmon.bin) and one slave process (DSKM) for each RDBMS
or ASM instance. The master performs monitoring and propagates state information to the slaves.
Slaves use the SGA to communicate with RDBMS or ASM processes. If there is a failure in the cluster,
Diskmon performs I/O fencing to protect data integrity. Cluster Synchronization Services (CSS) still
decides what to fence.
Master Diskmon starts with the cluster ware processes. The slave Diskmon processes are background
processes which are started and stopped in conjunction with the associated RDBMS or ASM instance.
EXADATA KEY FEATURES:

11
1. SMART SCAN:
A key advantage of Exadata Storage Server is the ability to offload some database processing to the
storage servers. With Exadata Storage Server, the database can offload single table scan predicate filters
and projections, join processing based on bloom filters, along with CPU-intensive decompression and
decryption operations. This ability is known as Smart Scan.

In addition to Smart Scan, Exadata Storage Server has other smart storage capabilities including the
ability to offload incremental backup optimizations, file creation operations, and more. This approach
yields substantial CPU, memory, and I/O bandwidth savings in the database server which can result in
massive performance improvements compared with conventional storage.

2. HYBRID COLUMNAR COMPRESSION:


The Exadata Storage Server provides a very advanced compression capability called Hybrid Columnar
Compression (HCC) that provides dramatic reductions in storage for large databases. Hybrid Columnar
Compression enables the highest levels of data compression and provides tremendous cost-savings and
performance improvements due to reduced I/O, especially for analytic workloads.
Storage savings is data dependent and often ranges from 5x to 20x. On conventional systems, enabling
high data compression has the drawback of reducing performance. Because the Exadata Database
Machine is able to offload decompression overhead into large numbers of processors in Exadata
storage, most workloads run faster using Hybrid Columnar Compression than they do without it.

12
3. STORAGE INDEXES:
A storage index is a memory-based structure that holds information about the data inside specified
regions of physical storage. The storage index keeps track of minimum and maximum column values and
this information is used to avoid useless I/O.
Storage Indexes are created automatically and transparently based on the SQL predicate information
executed by Oracle and passed down to the storage servers from the database servers.
Storage Indexes are very lightweight and can be created and maintained with no DBA intervention.

13
4. IORM (I/O Resource Management):

Exadata Storage Server I/O Resource Management (IORM) allows workloads and databases to share I/O
resources automatically according to user-defined policies. To manage workloads within a database, you
can define intra-database resource plans using the Database Resource Manager (DBRM), which has
been enhanced to work in conjunction with Exadata Storage Server. To manage workloads across
multiple databases, you can define IORM plans.

For example, if a production database and a test database are sharing an Exadata cell, you can configure
resource plans that give priority to the production database. In this case, whenever the test database
load would affect the production database performance, IORM will schedule the I/O requests such that
the production database I/O performance is not impacted. This means that the test database I/O
requests are queued until they can be issued without disturbing the production database I/O
performance.

14
5. SMART FLASH CACHE:

Exadata Storage Server makes intelligent use of high-performance flash memory to boost performance.

The Exadata Smart Flash Cache automatically caches frequently accessed data in PCI flash while keeping
infrequently accessed data on disk drives. This provides the performance of flash with the capacity and
low cost of disk. The Exadata Smart Flash Cache understands database workloads and knows when to
avoid caching data that the database will rarely access or is too big to fit in the cache.

Each Storage Server (X4-2) has 4 PCI FLASH CARDS and each FLASH CARD has 4 FLASH DISKS with a total
of 3Tb Flash Memory. Each FLASH DISK is 186 GB.

6. INFINI BAND NETWORK:

The InfiniBand network provides a reliable high-speed storage network and cluster interconnect for
Database Machine. It can also be used to provide high performance external connectivity to backup
servers, ETL servers and middleware servers such as Oracle Exalogic Elastic Cloud. Each database server
and Exadata Storage Server is connected to the InfiniBand network using a bonded network interface
(BONDIB0).

iDB is built on Reliable Datagram Sockets (RDS v3) protocol and runs over InfiniBand ZDP (Zero-loss
Zero-copy Datagram Protocol). The objective of ZDP is to eliminate unnecessary copying of blocks. RDS
is based on Socket API with low overhead, low latency, high bandwidth. Exadata Cell Node can
send/receive large transfer using Remote Direct Memory Access (RDMA).

15
RDMA is a direct memory access from the memory of one server into another server without involving
either’s operating system. The transfer requires no work to be done by CPUs, caches, or context
switches, and transfers continue in parallel with other system operations. It is quite useful in massively
parallel processing environment.
RDS is highly used on Oracle Exadata. RDS can deliver high available and low overhead of datagrams,
which is like UDP but more reliable and zero copy. It accesses to InfiniBand via the Socket API. RDS v3
supports both RDMA read and write and can allow large data transfer up to 8MB. It also supports the
control messages for asynchronous operation for submit and completion notifications.

If you want to optimize communications between Oracle Engineered System, like Exadata, Exalogic, Big
Data Appliance, and Exalytics, you can use Sockets Direct Protocol (SDP) networking protocol. SDP only
deals with stream sockets.

SDP allows high-performance zero-copy data transfers via RDMA network fabrics and uses a standard
wire protocol over an RDMA fabric to support stream sockets (SOCK_STREAM). The goal of SDP is to
provide an RDMA-accelerated alternative to the TCP protocol on IP, at the same time transparent to the
application.
It bypasses the OS resident TCP stack for stream connections between any endpoints on the RDMA
fabric. All other socket types (such as datagram, raw, packet, etc.) are supported by the IP stack and
operate over standard IP interfaces (i.e., IPoIB on InfiniBand fabrics). The IP stack has no dependency on
the SDP stack; however, the SDP stack depends on IP drivers for local IP assignments and for IP address
resolution for endpoint identifications.

7. DBFS (Database File System):


Oracle Database File System (DBFS) enables an Oracle database to be used as a POSIX (Portable
Operating System Interface) compatible file system. DBFS is an Oracle Database capability that provides
Database Machine users with a high performance staging area for bulk data loading and ETL processing.

16
FEATURES CONSOLIDATION:

EXADATA LAYERS:

17
REDUNDANCY & FAULT TOLERANCE in EXADATA
FAULT TOLERANCE
POWER SUPPLY FAILURE:
Redundant PDUs (Can survive with 1 out of 2) 50%

NETWORK FAILURE:
Redundant IB Switches (Can survive with 1 out of 2) 50%
& Network BONDING (Can survive with Half the Interfaces DOWN) 50%

STORAGE SERVER FAILURE:


ASM Redundancy (Can survive with 2 out of 3 STORAGE SERVERS) 33%$

DISKS FAILURE:
ASM Redundancy (Can survive with 9 out of 18 DISKS) 50%$

DB INSTANCE FAILURE:
RAC (Can survive with One NODE) 50%#

$ Based on ASM NORMAL REDUNDANCY


# Based on Quarter Rack with 2 DB Nodes.

18
DATABASE MACHINE IMPLEMENTATION OVERVIEW
Following are the phases in Exadata Database Machine implementation:

1. Pre-installation
 Various planning and scheduling activities including:
 Site planning: space, power, cooling, logistics…
 Configuration planning: host names, IP addresses, databases…
 Network preparation: DNS, NTP, cabling…
 Oracle and customer engineers work together

2. Installation and configuration


 Hardware and software installation and configuration
 Result is a working system based on a recommended configuration
 Recommended to be performed by Oracle engineers

3. Additional configuration
 Reconfigure storage using non-default settings (If needed)
 Create additional databases
 Configure additional Networks (backup / data guard)
 Configure Enterprise Manager
 Configure backup and recovery
 Data Migrations
 Configure additional Listeners (Data Guard / IB listener)
 Configuring Oracle Data Guard
 Configure DBFS
 Connect Oracle Exalogic Elastic Cloud

4. Post-installation
 Ongoing monitoring and maintenance

CONFIGURATION ACTIVITIES NOT SUPPORTED WITH DATABASE MACHINE:

 HARDWARE RE-RACKING: Customers sometimes wish to re-rack a Database Machine to comply


with a data center policy, to achieve earthquake protection or to overcome a physical limitation
of some sort. Apart from being inherently error-prone, re-racking can cause component
damage, thermal management issues, cable management issues and other issues. As a result,
hardware re-racking of Database Machine is not supported.

 ADDING COMPONENTS TO SERVERS: Customers sometimes wish to add components to


Database Machine. A typical example is the desire to add a Host Bus Adapter (HBA) to the
database servers so that they can be attached to existing SAN storage. Adding components to
servers in Database Machine is not supported because of the potential for driver and firmware
incompatibilities that could undermine the system.

19
 ADDING SERVERS TO QUARTER RACK OR HALF RACK CONFIGURATIONS: Outside of a
supported upgrade, physically adding servers to a Quarter Rack or Half Rack is not supported.
Because it changes the environmental and power characteristics of the system. It also impacts
on the future ability to conduct a supported upgrade. Note that customers can add Exadata cells
to any Database Machine configuration by placing the additional cells in a separate rack and by
using the spare InfiniBand ports to connect to them.

 SWAPPING LINUX DISTRIBUTIONS: Oracle Linux is provided as the operating system


underpinning the database servers and Exadata servers in Database Machine. Swapping Linux
distributions is not supported.

 CONFIGURING ACFS: ASM Cluster File System (ACFS) is currently unavailable on Database
Machine.

DISK GROUP SIZING:


The backup method information is used to size the ASM disk groups created during installation.
Specifically it is used to determine the default division of disk space between the DATA disk group and
the RECO disk group. The backup methods are as follows:

Backups internal to Oracle Exadata Database Machine (40-60):


This setting indicates that database backups will be created only in the Fast Recovery Area (FRA) located
in the RECO disk group. This setting allocates 40% of available space to the DATA disk group and 60% of
available space to the RECO disk group.

Backups external to Oracle Exadata Database Machine (80-20):


If you are performing backups to disk storage external to Oracle Exadata Database Machine, such as to
additional dedicated Exadata Storage Servers, ZFS, an NFS server, virtual tape library or tape library,
then select this option. This setting allocates 80% of available space to the DATA disk group and 20% of
available space to the RECO disk group.

ASM PROTECTION LEVELS:


The protection level specifies the ASM redundancy settings applied to different disk groups. The setting
will typically vary depending on numerous factors such as the nature of the databases being hosted on
the database machine, the size of the databases, the choice of backup method and the availability
targets. Oracle recommends the use of high redundancy disk groups for mission critical applications.

High for ALL:


Both the DATA disk group and RECO disk group are configured with Oracle ASM high redundancy (triple
mirroring). This option is only available if the external backup method is selected.

High for DATA:


The DATA disk group is configured with Oracle ASM high redundancy, and the RECO disk group is
configured with Oracle ASM normal redundancy (double mirroring).

20
High for RECO:
The DATA disk group is configured with Oracle ASM normal redundancy, and the RECO disk group is
configured with Oracle ASM high redundancy. This option is only available if the external backup
method is selected.

Normal for ALL:


The DATA Disk Group and RECO disk group are configured with Oracle ASM normal redundancy.

SELECTING OPERATING SYSTEM:


Customers now have a choice for the database server operating system:

 Oracle Linux X86_64 (default)


 Oracle Solaris 11 for x86

Oracle Exadata Database Machine is shipped with the Linux operating system and Solaris operating
system for the Oracle Database servers. Linux is the default operating system however customers can
choose Solaris instead. Servers are shipped from the factory with both operating systems preloaded.
After the choice of operating system is made, the disks containing the unwanted operating system must
be reclaimed so that they can be used by the chosen operating system.

ONE COMMAND:
The OneCommand utility is used to configure the Oracle Exadata Database Machine software stack
based on the information the configuration files. The slide lists the current steps (as at August 2011)
performed by OneCommand. The steps performed by OneCommand as subject to change as the
installation and configuration process becomes more refined and more automated.

The steps are run sequentially and each step must complete successfully before the next step
commences. All the steps, or a specified range of steps, can be run using a single command. Steps can
also be run individually.

If a step fails then in most cases the cause of the failure can be remedied and the process restarted by
re-running the failed step. Depending on the exact nature of the problem, some failures may require
additional manual effort to return the Database Machine back to a prefailure state. The README file
that accompanies OneCommand provides further guidance on these activities, however careful planning
and execution of all the installation and configuration steps by experienced personnel is a key to avoid
issues with OneCommand.

Depending on the Database Machine model and capacity, the entire OneCommand process (all steps)
can take up to about 5 hours to run.

To run OneCommand, change to the /opt/oracle.SupportTools/onecommand directory and run


deploy112.sh using the -i (install) option.

21
DATABASE MACHINE ARCHITECTURE OVERVIEW
Database Machine provides a resilient, high-performance platform for clustered and non-clustered
implementations of Oracle Database. The fundamental architecture underpinning Database Machine is
the same core architecture as Oracle Real Application Clusters (RAC) software. Key elements of the
Database Machine architecture are as below:

SHARED STORAGE:

Database Machine provides intelligent, high-performance shared storage to both single-instance and
RAC implementations of Oracle Database using Exadata Storage Server technology. Storage supplied by
Exadata Storage Servers is made available to Oracle databases using the Automatic Storage
Management (ASM) feature of Oracle Database. ASM adds resilience to Exadata Database Machine
storage by providing a mirroring scheme which can be used to maintain redundant copies of data on
separate Exadata Storage Servers. This protects against data loss if a storage server is lost.

STORAGE NETWORK:

Database Machine contains a storage network based on InfiniBand technology. This provides high
bandwidth and low latency access to the Exadata Storage Servers. Fault tolerance is built into the
network architecture through the use of multiple redundant network switches and network interface
bonding.

SERVER CLUSTER:

The database servers in Database Machine are designed to be powerful and well balanced so that there
are no bottlenecks within the server architecture. They are equipped with all of the components
required for Oracle RAC, enabling customers to easily deploy Oracle RAC across a single Database
Machine. Where processing requirements exceed the capacity of a single Database Machine, customers
can join multiple Database Machines together to create a single unified server cluster.

CLUSTER INTERCONNECTS:

The high bandwidth and low latency characteristics of InfiniBand are ideally suited to the requirements
of the cluster interconnect. Because of this, Database Machine is configured by default to also use the
InfiniBand storage networks as the cluster interconnect.

SHARED CACHE:

In a RAC environment, the instance buffer caches are shared. If one instance has an item of data in its
cache that is required by another instance, that data will be shipped to the required node using the
cluster interconnect. This key attribute of the RAC architecture significantly aids performance because
the memory-to-memory transfer of information via the cluster interconnect is significantly quicker than
writing and reading the information using disk. With Database Machine, the shared cache facility uses
the InfiniBand-based high-performance cluster interconnect.

22
ORACLE EXADATA X4-2 NETWORK ARCHITECTURE:

Database Machine contains three network types:

MANAGEMENT (ADMIN) NETWORK (NET0):

Management network is a standard Ethernet/IP network which is used to manage Database Machine.
The NET0/Management network allows for SSH connectivity to the server. It uses the eth0 interface,
which is connected to the embedded Cisco switch. The database servers and Exadata Storage Servers
also provide an Ethernet interface for Integrated Lights-Out Management (ILOM). Using ILOM,
administrators can remotely monitor and control the state of the server hardware. The InfiniBand
switches and PDUs also provide Ethernet ports for remote monitoring and management purposes.

CLIENT NETWORK (BONDETH0):

This is also a standard Ethernet network which is primarily used to provide database connectivity via
Oracle Net software. When Database Machine is initially configured, customers can choose to configure
the database servers with a single client network interface (NET1) or they can choose to configure a
bonded network interface (using NET1 and NET2).

The NET1, NET2, NET1-2/Client Access network provides access to the Oracle RAC VIP address and SCAN
addresses. It uses interfaces eth1 and eth2, which are typically bonded as bondeth0. These interfaces
are connected to the data center network. When channel bonding is configured for the client access
network during initial configuration, the Linux bonding module is configured for active-backup mode
(Mode 1). A bonded client access network interface can provide protection if a network interface fails
however using bonded interfaces may require additional configuration in the customer’s network.

23
INFINIBAND NETWORK (BONDIB0):

The InfiniBand network provides a reliable high-speed storage network and cluster interconnect for
Database Machine. It can also be used to provide high performance external connectivity to backup
servers, ETL servers and middleware servers such as Oracle Exalogic Elastic Cloud. Each database server
and Exadata Storage Server is connected to the InfiniBand network using a bonded network interface
(BONDIB0).
The IB network connects two ports on the Database servers to both of the InfiniBand leaf switches in the
rack. All storage server communication and Oracle RAC interconnect traffic uses this network. IB
network is typically bonded on ib0

OPTIONAL NETWORK:

Each X4-2 database server also contains a spare Ethernet port (NET3) which can be used to configure an
additional client access network. Each X4-2 database server is also equipped with two 10 gigabit
Ethernet (10 GbE) interfaces which can be used for client connectivity. These interfaces can be bonded
together or connected to separate networks. Customers must have the required network infrastructure
for 10 GbE to make use of these interfaces.

IP SHEET

24
X4-2 DATABASE SERVER OVERVIEW
 2 Twelve-Core Intel Xeon E5-2697 v2 processors (2.7 GHz)
 256 GB (16 x 16 GB) RAM expandable to 512 GB with memory expansion kit
 4 x 600 GB 10K RPM SAS disks
 Disk controller HBA with 512 MB battery-backed write cache, and swappable Battery Backup Unit
 2 InfiniBand 4X QDR (40 Gb/s) ports (1 dual-port PCIe 3.0 Host Channel Adapter (HCA))
 4 x 1 GbE/10GbE Base-T Ethernet ports
 2 x 10 GbE Ethernet SFP+ ports (1 dual-port 10GbE PCIe 2.0 network card based on the Intel 82599
10 GbE controller technology)
 1 Ethernet port for Integrated Lights Out Manager (ILOM) for remote management
 Oracle Linux 5 Update 10 with Unbreakable Enterprise Kernel 2 or Oracle Solaris 11 Update 1

DATABASE SERVER ARCHITECTURE:

The following components reside on the database server:


 Customers can choose between Oracle Linux x86_64 and Solaris 11 Express for x86 as the operating
system for Exadata Database Machine database servers.

 Exadata Database Machine database servers run Oracle Database 11g Release 2. The precise patch
release of Oracle Database software must be compatible with the Exadata Storage Server software
and other Database Machine software components. My Oracle Support bulletin 888828.1 contains
an up-to-date list of the supported versions for the Database Machine software components.

25
 Automatic Storage Management (ASM) is required and provides a file system and volume manager
optimized for Oracle Database. You can connect multiple separate ASM environments with separate
disk groups to a pool of Exadata Storage Servers.

 Oracle Database communicates with Exadata cells using a special library called LIBCELL
($ORACLE_HOME/lib/libcell11.so). In combination with the database kernel and ASM, LIBCELL
transparently maps database I/O operations to Exadata Storage Server enhanced operations.
LIBCELL communicates with Exadata cells using the Intelligent Database protocol (iDB). iDB is a
unique Oracle data transfer protocol, built on Reliable Datagram Sockets (RDS), that runs on
industry standard InfiniBand networking hardware. LIBCELL and iDB enable ASM and database
instances to utilize Exadata Storage Server features, such as Smart Scan and I/O Resource
Management.

 Database Resource Manager (DBRM) is integrated with Exadata Storage Server I/O Resource
Management (IORM). DBRM and IORM work together to ensure that I/O resources are allocated
based on administrator-defined priorities.

The above diagram shows some additional processes that run on the database servers which relate to
the use of Exadata cells for storage inside Database Machine.

Diskmon checks the storage network interface state and cell aliveness, it also performs DBRM plan
propagation to Exadata cells.

Diskmon uses a node wide master process (diskmon.bin) and one slave process (DSKM) for each RDBMS
and ASM instance. The master performs monitoring and propagates state information to the slaves.
Slaves use the SGA to communicate with RDBMS and ASM processes.

26
If there is a failure in the cluster, Diskmon performs I/O fencing to protect data integrity. Cluster
Synchronization Services (CSS) still decides what to fence. Master Diskmon starts with the Clusterware
processes. The slave Diskmon processes are background processes which are started and stopped in
conjunction with the associated RDBMS and ASM instance.

KEY CONFIGURATION FILES:

cellinit.ora:
 CELL local IP addresses (ib0 interface)
 Location: /etc/oracle/cell/network-config/cellinit.ora

[root@exa01dbadm01 network-config]# cat cellinit.ora


ipaddress1=192.xx.xx.1/22
ipaddress2=192.xx.xx.2/22

cellip.ora:
 Contains the accessible storage cell IP addresses.
 Location: /etc/oracle/cell/network-config/cellip.ora

[root@exa01dbadm01 network-config]# cat cellip.ora


cell="192.xx.xx.5;192.xx.xx.6"
cell="192.xx.xx.7;192.xx.xx.8"
cell="192.xx.xx.9;192.xx.xx.10"

cellroute.ora:
 Contains the accessible routes to the storage cells.
 Location: /etc/oracle/cell/network-config/cellroute.ora

[root@exa01dbadm01 network-config]# cat cellroute.ora


# Routes for 192.xx.xx.5;192.xx.xx.6
route="192.xx.xx.5;192.xx.xx.1"
route="192.xx.xx.6;192.xx.xx.2"

# Routes for 192.xx.xx.7;192.xx.xx.8


route="192.xx.xx.7;192.xx.xx.1"
route="192.xx.xx.8;192.xx.xx.2"

# Routes for 192.xx.xx.9;192.xx.xx.10


route="192.xx.xx.9;192.xx.xx.1"
route="192.xx.xx.10;192.xx.xx.2"

27
INFINIBAND NETWORK
 Is the Database Machine interconnect fabric:
– Provides highest performance available – 40 Gb/sec each direction
 Is used for storage networking, RAC interconnect and high-performance
 external connectivity:
– Less configuration, lower cost, higher performance
 Looks like normal Ethernet to host software:
– All IP-based tools work transparently – TCP/IP, UDP, SSH, and so on
 Has the efficiency of a SAN:
– Zero copy and buffer reservation capabilities
 Uses high-performance ZDP InfiniBand protocol (RDS V3):
– Zero-copy, zero-loss Datagram protocol
– Open Source software developed by Oracle
– Very low CPU overhead

InfiniBand is the only storage network supported inside Database Machine because of its performance
and proven track record in high-performance computing. InfiniBand works like normal Ethernet but is
much faster. It has the efficiency of a SAN, using zero copy and buffer reservation. Zero copy means that
data is transferred across the network without intermediate buffer copies in the various network layers.
Buffer reservation is used so that the hardware knows exactly where to place buffers ahead of time.

Oracle Exadata uses the Intelligent Database protocol (iDB) to transfer data between Database Node
and Storage Cell Node. It is implemented in the database kernel and transparently maps database
operations to Exadata operations. iDB can be used to transfer SQL operation from Database Node to Cell
node, and get query result back or full data blocks back from Cell Node.

iDB is built on Reliable Datagram Sockets (RDS v3) protocol and runs over InfiniBand ZDP (Zero-loss
Zero-copy Datagram Protocol). The objective of ZDP is to eliminate unnecessary copying of blocks. RDS is
based on Socket API with low overhead, low latency, high bandwidth. Exadata Cell Node can
send/receive large transfer using Remote Direct Memory Access (RDMA).

RDMA is a direct memory access from the memory of one server into another server without involving
either’s operating system. The transfer requires no work to be done by CPUs, caches, or context
switches, and transfers continue in parallel with other system operations. It is quite useful in massively
parallel processing environment.

RDS is highly used on Oracle Exadata. RDS can deliver high available and low overhead of datagrams,
which is like UDP but more reliable and zero copy. It accesses to InfiniBand via the Socket API. RDS v3
supports both RDMA read and write and can allow large data transfer up to 8MB. It also supports the
control messages for asynchronous operation for submit and completion notifications.

28
If you want to optimize communications between Oracle Engineered System, like Exadata, Big Data
Appliance, and Exalytics, you can use Sockets Direct Protocol (SDP) networking protocol. SDP only deals
with stream sockets.
SDP allows high-performance zero-copy data transfers via RDMA network fabrics and uses a standard
wire protocol over an RDMA fabric to support stream sockets (SOCK_STREAM). The goal of SDP is to
provide an RDMA-accelerated alternative to the TCP protocol on IP, at the same time transparent to the
application.
It bypasses the OS resident TCP stack for stream connections between any endpoints on the RDMA
fabric. All other socket types (such as datagram, raw, packet, etc.) are supported by the IP stack and
operate over standard IP interfaces (i.e., IPoIB on InfiniBand fabrics). The IP stack has no dependency on
the SDP stack; however, the SDP stack depends on IP drivers for local IP assignments and for IP address
resolution for endpoint identifications.

29
INFINIBAND NETWORK OVERVIEW:

Each Database Machine contains at least two InfiniBand Switches which connect the database servers
and storage servers. These are called leaf switches. A third switch, called the spine switch, connects to
both leaf switches. The spine switch facilitates connection of multiple racks to form a single larger
Database Machine environment.

Each server contains at least one pair of InfiniBand ports which are bonded together using active/active
bonding. The active and active connections are spread across both leaf switches which doubles the
throughput. In addition, the leaf switches within a rack are connected to each other. The result is a Fat-
Tree switched fabric network topology.

30
EXADATA AND EXALOGIC INTEGRATION:

POWERING OFF ORACLE EXADATA RACK


The power off sequence for Oracle Exadata Rack is as follows:
1. Database servers (Oracle Exadata Database Machine only).
2. Exadata Storage Servers.
3. Rack, including switches.

POWERING OFF DATABASE SERVERS:


The following procedure is the recommended shutdown procedure for database servers:
a. Stop Oracle Clusterware using the following command:
# GRID_HOME/grid/bin/crsctl stop cluster
If any resources managed by Oracle Clusterware are still running after issuing the crsctl stop cluster
command, then the command fails. Use the -f option to unconditionally stop all resources, and stop
Oracle Clusterware.
b. Shut down the operating system using the following command:
# shutdown -h -y now

POWERING OFF EXADATA STORAGE SERVERS:


Exadata Storage Servers are powered off and restarted using the Linux shutdown command. The
following command shuts down Exadata Storage Server immediately:
# shutdown -h -y now

When powering off Exadata Storage Servers, all storage services are automatically stopped.
(shutdown -r -y now command restarts Exadata Storage Server immediately.)

Note the following when powering off Exadata Storage Servers:


 All database and Oracle Clusterware processes should be shut down prior to shutting down
more than one Exadata Storage Server.

31
 Powering off one Exadata Storage Server does not affect running database processes or Oracle
ASM.
 Powering off or restarting Exadata Storage Servers can impact database availability.
 The shutdown commands can be used to power off or restart database servers.

POWERING ON AND OFF NETWORK SWITCHES:

The network switches do not have power switches. They powered off when power is removed, by way
of the power distribution unit (PDU) or at the breaker in the data center.

POWERING ON ORACLE EXADATA RACK


The power on sequence for Oracle Exadata Rack is as follows (reverse to Power off):

1 Rack, including switches.


2 Exadata Storage Servers.
3 Database servers (Oracle Exadata Database Machine only).
4 START Cluster
5 Start Database.

PDU (POWER DISTRIBUTION UNIT):


The Exadata Database Machine has two PDU for the purpose of Redundancy. Each PDU has 3 phase
power supply.

32
INTEGRATED LIGHTS OUT MANAGER (ILOM)

 What is it?
– Integrated service processor hardware and software
 What does it do?
– Provides out-of-band server monitoring and management to:
– Remotely control the power state of a server
– View the status of sensors and indicators on the system
– Provide a remote server console
– Generates alerts for hardware errors and faults as they occur
 Where is it found?
– Exadata Database Machine database servers and Exadata Storage Servers

Oracle Integrated Lights Out Manager (ILOM) provides advanced service processor (SP) hardware and
software that you can use to manage and monitor your Exadata machine components, such as compute
nodes, storage nodes, and the InfiniBand switch. ILOM's dedicated hardware and software is
preinstalled on these components.

ILOM enables you to actively manage and monitor compute nodes in the Exadata machine
independently of the operating system state, providing you with a reliable Lights Out Management
system.

With ILOM, you can proactively:


 Learn about hardware errors and faults as they occur
 Remotely control the power state of your compute node & cell node.
 View the graphical and non-graphical consoles for the host
 View the current status of sensors and indicators on the system
 Determine the hardware configuration of your system
 Receive generated alerts about system events in advance via IPMI PETs, SNMP Traps, or E-mail
Alerts.

33
The ILOM service processor (SP) runs its own embedded operating system and has a dedicated Ethernet
port, which together provides out-of-band management capability. In addition, you can access ILOM
from the compute node's operating system. Using ILOM, you can remotely manage your compute node
as if you were using a locally attached keyboard, monitor, and mouse.

ILOM automatically initializes as soon as power is applied to your compute node. It provides a full-
featured, browser-based web interface and has an equivalent command-line interface (CLI).
Exadata compute nodes are configured at the time of manufacturing to use Sideband Management. This
configuration eliminates separate cables for the Service Processor (SP) NET MGT port and the NET0 Port.

ILOM Interfaces
ILOM supports multiple interfaces for accessing its features and functions. You can choose to use a
browser-based web interface or a command-line interface.
Web Interface
The web interface provides an easy-to-use browser interface that enables you to log in to the SP, then to
perform system management and monitoring.
Command-Line Interface
The command-line interface enables you to operate ILOM using keyboard commands and adheres to
industry-standard DMTF-style CLI and scripting protocols. ILOM supports SSH v2.0 and v3.0 for secure
access to the CLI.

34
AUTO SERVICE REQUEST (ASR) OVERVIEW:
Auto Service Request (ASR) is designed to automatically open service requests when specific Oracle
Exadata Rack hardware faults occur. To enable this feature, the Oracle Exadata Rack components must
be configured to send hardware fault telemetry to the ASR Manager software. This service covers
components in Exadata Storage Servers and Oracle Database servers, such as disks and flash cards.

ASR Manager must be installed on a server that has connectivity to Oracle Exadata Rack, and an
outbound Internet connection using HTTPS or an HTTPS proxy. ASR Manager can be deployed on a
standalone server running Oracle Solaris or Linux, or a database server on Oracle Exadata Database
Machine. Oracle recommends that ASR Manager be installed on a server outside of Oracle Exadata Rack.
Once installation is complete, configure fault telemetry destinations for the servers on Oracle Exadata
Database Machine.

The following are some of the reasons for the recommendation:


 If the server that has ASR Manager installed goes down, then ASR Manager is unavailable for the
other components of Oracle Exadata Database Machine. This is very important when there are
several Oracle Exadata Database Machines using ASR at a site.
 In order to submit a service request (SR), the server must be able to access the Internet.

Note: ASR can only use the management network. Ensure the management network is set up to allow
ASR to run.

When a hardware problem is detected, ASR Manager submits a service request to Oracle Support
Services. In many cases, Oracle Support Services can begin work on resolving the issue before the
database administrator is even aware the problem exists.

35
Prior to using ASR, the following must be set up:
 Oracle Premier Support for Systems or Oracle/Sun Limited Warranty
 Technical contact responsible for Oracle Exadata Rack
 Valid shipping address for Oracle Exadata Rack parts

An e-mail message is sent to the technical contact for the activated asset to notify the creation of the
service request. The following are examples of the disk failure Simple Network Management Protocol
(SNMP) traps sent to ASR Manager.

NOTES:
 ASR is applicable only for component faults. Not all component failures are covered, though the
most common components such as disk, fan, and power supplies are covered.

 ASR is not a replacement for other monitoring mechanisms, such as SMTP, and SNMP alerts, within
the customer data center. ASR is a complementary mechanism that expedites and simplifies the
delivery of replacement hardware. ASR should not be used for downtime events in high-priority
systems. For high-priority events, contact Oracle Support Services directly.

 There are occasions when a service request may not be automatically filed. This can happen because
of the unreliable nature of the SNMP protocol or loss of connectivity to the ASR Manager. Oracle
recommends that customers continue to monitor their systems for faults, and call Oracle Support
Services if they do not receive notice that a service request has been automatically filed.

36
STORAGE SERVER OVERVIEW

37
Exadata Storage Server is highly optimized storage for use with Oracle Database. It delivers outstanding
I/O and SQL processing performance for data warehousing and OLTP applications.

Each Exadata Storage Server is based on a 64 bit Intel-based Sun Fire server. Oracle provides the storage
server software to impart database intelligence to the storage, and tight integration with Oracle
database and its features. Each Exadata Storage Server is shipped with all the hardware and software
components preinstalled including the Exadata Storage Server software, Oracle Linux x86_64 operating
system and InfiniBand protocol drivers.

Exadata Storage Server is only available for use in conjunction with Database Machine. Individual
Exadata Storage Servers can be purchased, however they must be connected to a Database Machine.
Custom configurations using Exadata Storage Servers are not supported for new installations.

38
EXADATA STORAGE SERVER ARCHITECTURE OVERVIEW:

 Exadata Storage Server is a self-contained storage platform that houses disk storage and runs the
Exadata Storage Server software provided by Oracle.
 Exadata Storage Server is also called a cell. A cell is the building block for a storage grid.
 More cells provide greater capacity and I/O bandwidth.
 Databases are typically deployed across multiple cells, and multiple databases can share the storage
provided by a single cell.
 The databases and cells communicate with each other via a high-performance InfiniBand network.
 Each cell is a purely dedicated storage platform for Oracle Database files although you can use
Database File System (DBFS), a feature of Oracle Database, to store your business files inside the
database.
 Each cell is a computer with CPUs, memory, a bus, disks, network adapters, and the other
components normally found in a server.
 It also runs an operating system (OS), which in the case of Exadata Storage Server is Linux x86_64.
 The Oracle provided software resident in the Exadata cell runs under this operating system.
 The OS is accessible in a restricted mode to administer and manage Exadata Storage Server.
 CAN NOT INSTALL ANY SOFTWARE OR MAKE ANY CHANGES.

39
PROCESSORS:

MEMORY:

STORAGE:

CellCLI> list celldisk attributes name, disktype, size


CD_00_exa01celadm01 HardDisk 3691.484375G
CD_01_exa01celadm01 HardDisk 3691.484375G
CD_02_exa01celadm01 HardDisk 3725.28125G
CD_03_exa01celadm01 HardDisk 3725.28125G
CD_04_exa01celadm01 HardDisk 3725.28125G
CD_05_exa01celadm01 HardDisk 3725.28125G

FD_00_exa01celadm01 FlashDisk 186.25G


FD_01_exa01celadm01 FlashDisk 186.25G
FD_02_exa01celadm01 FlashDisk 186.25G
FD_03_exa01celadm01 FlashDisk 186.25G
FD_04_exa01celadm01 FlashDisk 186.25G
FD_05_exa01celadm01 FlashDisk 186.25G
FD_06_exa01celadm01 FlashDisk 186.25G
FD_07_exa01celadm01 FlashDisk 186.25G

40
DATABASE MACHINE SOFTWARE ARCHITECTURE DETAILS:

THE CELL PROCESSES:

1. Cell Server (CELLSRV):


 CELLSRV communicates with LIBCELL. LIBCELL converts I/O requests into messages for data
along with metadata to CELLSRV using the iDB protocol.
 CELLSRV is a multithreaded server.
 CELLSRV is able to use the metadata to process the data before sending results back to the
database layer.
 CELLSRV serves simple block requests, such as database buffer cache reads, and Smart Scan
requests, such as table scans with projections and filters.
 CELLSRV serves oracle blocks when SQL Offload is not possible.
 CELLSRV also implements IORM, which works in conjunction with DBRM.
 CELLSRV collects numerous statistics relating to its operations.

2. Management Server (MS):


 MS provides Exadata cell management, configuration and administration functions.
 MS works in cooperation with the Exadata cell command-line interface (CellCLI).
 MS is responsible for sending alerts and collects some statistics in addition to those collected by
CELLSRV.

3. Restart Server (RS):


 RS is used to start up/shut down the CELLSRV and MS services and monitors these services to
automatically restart them if required.

41
Cellrssrm: The cellrssrm process is the main Restart Server process. It launches 3 helper processes:
cellrsomt, cellrsbmt and cellrsmmt

Cellrsomt: ultimately responsible for launching cellsrv.


Cellrsbmt and cellrsmmt: Responsible for launching cellrsbkm and the main MS Java process.
Cellrssmt is called by cellrsbkm, and its ultimate goal is to ensure cell configuration files are valid,
consistent, and backed up.

[root@exa1celadm02 bin]# ps -ef | grep -i cellrs


root 13252 1 0 Mar17 ? 00:11:25 /opt/oracle/cell/cellsrv/bin/cellrssrm -ms 1 -cellsrv 1
root 13259 13252 0 Mar17 ? 00:06:37 /opt/oracle/cell/cellsrv/bin/cellrsbmt -rs_conf
root 13260 13252 0 Mar17 ? 00:03:58 /opt/oracle/cell/cellsrv/bin/cellrsmmt -rs_conf
root 27059 13252 0 Mar26 ? 00:32:26 /opt/oracle/cell/cellsrv/bin/cellrsomt -rs_conf
root 13262 13259 0 Mar17 ? 00:01:47 /opt/oracle/cell/cellsrv/bin/cellrsbkm -rs_conf
root 13269 13262 0 Mar17 ? 00:06:27 /opt/oracle/cell/cellsrv/bin/cellrssmt -rs_conf

[root@exa01celadm02 ~]# ps -ef | grep -i ms | egrep java


root 1588 13260 0 Aug04 ? 10:32:55 /usr/java/default/bin/java -Xms256m -Xmx512m -XX:-

[root@exa01celadm02 ~]# ps -ef | grep -i 13260


root 1588 13260 0 Aug04 ? 10:32:55 /usr/java/default/bin/java -Xms256m -Xmx512m -XX:-
root 13260 13252 0 Mar17 ? 00:04:44 /opt/oracle/cell/cellsrv/bin/cellrsmmt -rs_conf

EXAMPLE:

[root@exa01celadm01 trace]# tail -50f /var/log/oracle/diag/asm/cell/localhost/trace/alert.log


Thu Oct 15 11:05:49 2015
Errors in file /opt/oracle/cell/log/diag/asm/cell/exa01celadm01/trace/svtrc_9016_22.trc (incident=25):
ORA-00600: internal error code, arguments: [ossdebugdisk:cellsrvstatIoctl_missingstat], [228],
[Database group composite metric], [], [], [], [], [], [], [], [], []
Incident details in:
/opt/oracle/cell/log/diag/asm/cell/dm01celadm01/incident/incdir_25/svtrc_9016_22_i25.trc
Sweep [inc][25]: completed
Thu Oct 15 11:05:52 2015 876 msec State dump completed for CELLSRV<9016> after ORA-600 occurred
CELLSRV error - ORA-600 internal error
Thu Oct 15 11:05:53 2015
[RS] monitoring process /opt/oracle/cell/cellsrv/bin/cellrsomt (pid: 9014) returned with error: 128
Thu Oct 15 11:05:53 2015
[RS] Started monitoring process /opt/oracle/cell/cellsrv/bin/cellrsomt with pid 23594
Thu Oct 15 11:05:53 2015
CELLSRV process id=23596

42
CELLSRVSTAT:
Cellsrvstat is very useful utility to get the cell level statistics for all the logical components of cell like
memory, io, smartio, flashcache etc...

Cellsrvstat is used to get quick cell level statistics from cell storage. It also helps you to get information
of offloading and storage index.

[root@dm01celadm01 ~]# cellsrvstat -list


Statistic Groups:
io Input/Output related stats
mem Memory related stats
exec Execution related stats
net Network related stats
smartio SmartIO related stats
flashcache FlashCache related stats
health Cellsrv health/events related stats
offload Offload server related stats
database Database related stats
ffi FFI related stats
lio LinuxBlockIO related stats

Simply running the utility from the command prompt, without any additional parameters or qualifiers,
produces the output. You can also restrict the output of cellsrvstat by using the -stat_group parameter
to specify which group, or groups, you want to monitor.

In non-tabular mode, the output has three columns. The first column is the name of the metric, the
second one is the difference between the last and the current value (delta), and the third column is the
absolute value.
In Tabular mode absolute values are printed as is without delta. cellsrvstat -list command points out the
statistics that are absolute values.

You can get the list of all statistics by executing below command on any of the cell.
You can also use DCLI utility to get statistics output from each cell at a time and execute it from one of
your database server. Make sure SSH connectivity is configured between cell and db node.

#dcli -g cellgroup -l root 'cellsrvstat -stat_group=io -interval=30 -count=2' > /tmp/cellstats.txt

Here
-cellgroup is the file which contains the list of IPs for all cell storage
-We have used 5 seconds of interval and getting output for 3 times which you can change as per your
requirement
-You can mention -stat_group if you want stats for specific group. i.e io, smartio, mem, net etc...
-We are saving stats output into cellstats.txt file

43
KEY CONFIGURATION FILES:

cell_disk_config.xml:
 MS internal dictionary
 Contains information about the DISKS.
 Location: /opt/oracle/cell12.1.1.1.1_LINUX.X64_140712/cellsrv/deploy/config/cell_disk_config.xml

cellinit.ora:
 CELL Initialization Parameters and IP addresses
 Location: /opt/oracle/cell12.1.1.1.1_LINUX.X64_140712/cellsrv/deploy/config/cellinit.ora

Cell.conf:
 CELL configuration file.
 Location: /opt/oracle.cellos/cell.conf

LOG FILES:
$CELLTRACE is the location for cell alert log file; MS log file and trace files.

[root@exa01celadm01 ~]# cd $CELLTRACE


[root@exa01celadm01 trace]# pwd
/opt/oracle/cell12.1.1.1.1_LINUX.X64_140712/log/diag/asm/cell/exa01celadm01/trace

[root@exa01celadm01 trace]# ls -ltr *.log


-rw-rw---- 1 root celladmin 459834 Oct 15 11:24 alert.log
-rw-r--r-- 1 root celladmin 2356652 Oct 15 13:37 ms-odl.log

BACKGROUND PROCESSES:
The background processes for the database and Oracle ASM instance for an Oracle Exadata Storage
Server environment are the same as other environments, except for the following background process:
 DISKMON Process
 XDMG Process
 XDWK Process

DISKMON Process:
The DISKMON process is a fundamental component of Oracle Exadata Storage Server Software, and is
responsible for implementing I/O fencing. The process is located on the database server host computer,
and is part of Oracle Clusterware Cluster Ready Services (CRS). This process is important for Oracle
Exadata Storage Server Software and should not be modified. The log files for Diskmon are located in
the $CRS_HOME/log/hostname/Diskmon directory

44
XDMG Process:
The XDMG (Exadata Automation Manager) process initiates automation tasks used for monitoring
storage. This background process monitors all configured Oracle Exadata Storage Servers for state
changes, such as replaced disks, and performs the required tasks for such changes. Its primary task is to
watch for inaccessible disks and cells, and to detect when the disks and cells become accessible. When
the disks and cells are accessible, the XDMG process initiates the ASM ONLINE process, which is handled
by the XDWK background process. The XDMG process runs in the Oracle ASM instances.

XDWK Process:
The XDWK (Exadata Automation Worker) process performs automation tasks by requested by the XDMG
background process. The XDWK process begins when asynchronous actions, such as ONLINE, DROP or
ADD for an Oracle ASM disk are requested by the XDMG process. The XDWK process stops after 5
minutes of inactivity. The XDWK process runs in the Oracle ASM instances.

THE EXADATA CELL:


 The Exadata Storage server It-self is called as cell.
 Exadata Storage server contains 2 Types of Disks, CELL DISKS and FLASH DISKS.
 The EXADATA cell consists of 12 PHYSICAL DISKS (600GB High Performance (or) 4 TB High Capacity)

LAYERS OF THE DISK:


There are 4 Layers of a DISK in Exadata Storage Server. Cellci is the Command line utility used to
maintain Disks.

1. Physical Disk:

Physical Disks can be Hard disk or FlashDisk. You cannot create or drop a Physical Disk. The only
administrative task on that layer can be to turn the LED at the front of the Cell and List Physical Disk.

Examples:
CELLCLI> ALTER PHYSICALDISK 20:0 SERVICELED ON
CELLCLI> ALTER PHYSICALDISK 20:0 DROP FOR REPLACEMENT
CELLCLI> ALTER PHYSICALDISK 20:0 REENABLE
CELLCLI> ALTER PHYSICALDISK SERVICE LED ON
CELLCLI> LIST PHYSICALDISK

45
2. LUN:

Luns are the second layer of abstraction. First two Luns in every Cell contain the Operating System
(Oracle Enterprise Linux). About 29 GB is reserved in the first 2 Hard disks for this purpose. These two
disks are redundant to each other. The Cell can still operate if one of the first 2 Hard disks fails. The LUNs
are equally sized on each Hard disk, but the usable space (for Cell disks resp. Grid disks) is about 30 GB
less on the first two. As an Administrator, you cannot do anything on the LUN Layer except looking at.

Example:
CELLCLI > LIST LUN

3. Cell Disk:

Cell disks are the third layer of abstraction. As an Administrator, you could create and drop Cell disks,
although you will rarely if at all to do that.

Examples:
CELLCLI> LIST CELLDISK
CELLCLI> CREATE CELLDISK ALL HARDDISK
CELLCLI> DROP CELLDISK ALL
CELLCLI> ALTER CELLDISK 123 name='abc', comment='name was changed to abc'

4. Grid Disk:

Grid disks are the fourth layer of abstraction, and they will be the Candidate Disks to build your ASM
diskgroups. By default (interleaving=none on the Cell disk layer), the first Grid disk that is created upon a
Cell disk is placed on the outer sectors of the underlying Hard disk. It will have the best performance. If
we follow the recommendations, we will create 3 Diskgroups upon our Grid disks: DATA, RECO and
DBFS_DG.

DATA is supposed to be used as the Database Area (DB_CREATE_FILE_DEST=’+DATA’ on the Database


Layer), RECO will be the Recovery Area (DB_RECOVERY_FILE_DEST=’+RECO’) and DBFS_DG will be used
to hold Voting Files & OCR files and DBFS File Systems if needed. It makes sense that DATA has a better
performance than RECO, and DBFS_DG can be placed on the slowest (inner) part of the Hard disks.
So as an Administrator, you can (and will, most likely) create and drop Grid disks; Typically 3 Grid disks
are carved out of each Cell disk.

Examples:
CELLCLI> LIST GRIDDISK
CELLCLI> ALTER GRIDDISK 123 name='abc', comment='name was changed to abc'
CELLCLI> DROP GRIDDISK GD123_0
CELLCLI> DROP GRIDDISK ALL PREFIX=DATAC1
CELLCLI> CREATE GRIDDISK GD123_0 celldisk = CD123, size =100M
CELLCLI> CREATE GRIDDISK ALL PREFIX=data1, size=50M

46
 Exadata cell software automatically senses the physical disks in each storage server.
 As a cell administrator you can only view physical disk attributes. Each physical disk is mapped to a
Logical Unit (LUN).
 A LUN exposes additional predefined metadata attributes to a cell administrator. You cannot create
or remove a LUN; they are automatically created.
 Each of the first two LUNs contains a system area that spans multiple disk partitions. The two system
areas are mirror copies of each other which are maintained using software mirroring.
 The system areas consume approximately 29 GB on each disk. The system areas contain the OS
image, swap space, Exadata cell software binaries, metric and alert repository, and various other
configuration and metadata files.
 A cell disk is a higher level abstraction that represents the data storage area on each LUN. For the
two LUNs that contain the system areas, Exadata cell software recognizes the way that the LUN is
partitioned and maps the cell disk to the disk partition reserved for data storage. For the other 10
disks, Exadata cell software maps the cell disk directly to the LUN.

47
 After a cell disk is created, it can be subdivided into one or more grid disks, which are directly
exposed to ASM.

Placing multiple grid disks on a cell disk allows the administrator to segregate the storage into pools with
different performance characteristics. For example, a cell disk could be partitioned so that one grid disk
resides on the highest performing portion of the disk (the outermost tracks on the physical disk),
whereas a second grid disk could be configured on the lower performing portion of the disk (the inner
tracks). The first grid disk might then be used in an ASM disk group that houses highly active (hot) data,
while the second grid disk might be used to store less active (cold) data files.

FLASH DISK:

 Each Exadata cell contains 3TB of high performance flash memory distributed across 4 PCI flash
memory cards. Each card has 4 flash devices for a total of 16 flash devices on each cell. Each flash
device has a capacity of 186 GB.
 Essentially, each flash device is like a physical disk in the storage hierarchy. Each flash device is
visible to the Exadata cell software as a LUN. You can create a cell disk using the space on a flash-
based LUN. You can then create numerous grid disks on each flash-based cell disk. Unlike physical
disk devices, the allocation order of flash space is not important from a performance perspective.
 While it is possible to create flash-based grid disks, the primary use for flash storage is to support
Exadata Smart Flash Cache, a high-performance caching mechanism for frequently accessed data on
each Exadata cell.
 By default, the initial cell configuration process creates flash-based cell disks on all the flash devices,
and then allocates most of the available flash space to Exadata Smart Flash Cache.

48
 To create space for flash-based grid disks, the default Exadata Smart Flash Cache must first be
dropped. Then a new Exadata Smart Flash Cache and flash-based grid disks can be created using
sizes chosen by the cell administrator.
 It is possible to allocate up to one Exadata Smart Flash Cache area and zero or more flash-based grid
disks from a flash-based cell disk.

CellCLI> list physicaldisk attributes name,physicalSerial,diskType,physicalSize


20:0 E0024X HardDisk 3726.023277282715G
20:1 EZSENX HardDisk 3726.023277282715G
20:2 E07EMX HardDisk 3726.023277282715G
20:3 EM297X HardDisk 3726.023277282715G
20:4 E03EKX HardDisk 3726.023277282715G
20:5 E00B7X HardDisk 3726.023277282715G
20:6 ETYK8X HardDisk 3726.023277282715G
20:7 ERJWCX HardDisk 3726.023277282715G
20:8 ES7EHX HardDisk 3726.023277282715G
20:9 ETTAKX HardDisk 3726.023277282715G
20:10 EV1JYX HardDisk 3726.023277282715G
20:11 ETYBXX HardDisk 3726.023277282715G
FLASH_1_0 11000257739 FlashDisk 186.26451539993286G
FLASH_1_1 11000257712 FlashDisk 186.26451539993286G
FLASH_1_2 11000293811 FlashDisk 186.26451539993286G
FLASH_1_3 11000293764 FlashDisk 186.26451539993286G

49
FLASH_2_0 11000299734 FlashDisk 186.26451539993286G
FLASH_2_1 11000299720 FlashDisk 186.26451539993286G
FLASH_2_2 11000299809 FlashDisk 186.26451539993286G
FLASH_2_3 11000299796 FlashDisk 186.26451539993286G
FLASH_4_0 11000299803 FlashDisk 186.26451539993286G
FLASH_4_1 11000299790 FlashDisk 186.26451539993286G
FLASH_4_2 11000300700 FlashDisk 186.26451539993286G
FLASH_4_3 11000299794 FlashDisk 186.26451539993286G
FLASH_5_0 11000299714 FlashDisk 186.26451539993286G
FLASH_5_1 11000299709 FlashDisk 186.26451539993286G
FLASH_5_2 11000299708 FlashDisk 186.26451539993286G
FLASH_5_3 11000296798 FlashDisk 186.26451539993286G

CellCLI> list lun attributes name,diskType,id,lunSize,isSystemLun

0_0 HardDisk 0_0 3725.2900390625G TRUE


0_1 HardDisk 0_1 3725.2900390625G TRUE
0_2 HardDisk 0_2 3725.2900390625G FALSE
0_3 HardDisk 0_3 3725.2900390625G FALSE
0_4 HardDisk 0_4 3725.2900390625G FALSE
0_5 HardDisk 0_5 3725.2900390625G FALSE
0_6 HardDisk 0_6 3725.2900390625G FALSE
0_7 HardDisk 0_7 3725.2900390625G FALSE
0_8 HardDisk 0_8 3725.2900390625G FALSE
0_9 HardDisk 0_9 3725.2900390625G FALSE
0_10 HardDisk 0_10 3725.2900390625G FALSE
0_11 HardDisk 0_11 3725.2900390625G FALSE

CellCLI> list celldisk attributes name,diskType,physicalDisk,size

CD_00_exa01celadm01 HardDisk E0024X 3691.484375G


CD_01_exa01celadm01 HardDisk EZSENX 3691.484375G
CD_02_exa01celadm01 HardDisk E07EMX 3725.28125G
CD_03_exa01celadm01 HardDisk EM297X 3725.28125G
CD_04_exa01celadm01 HardDisk E03EKX 3725.28125G
CD_05_exa01celadm01 HardDisk E00B&X 3725.28125G
FD_00_exa01celadm01 FlashDisk 11000257739 186.25G
FD_01_exa01celadm01 FlashDisk 11000257712 186.25G
FD_02_exa01celadm01 FlashDisk 11000293811 186.25G
FD_03_exa01celadm01 FlashDisk 11000293764 186.25G
FD_04_exa01celadm01 FlashDisk 11000299734 186.25G
FD_05_exa01celadm01 FlashDisk 11000299720 186.25G
FD_06_exa01celadm01 FlashDisk 11000299809 186.25G
FD_07_exa01celadm01 FlashDisk 11000299796 186.25G

50
CellCLI> list griddisk attributes name,asmDiskGroupName,cellDisk,size

DATAC1_CD_00_exa01celadm01 DATAC1 CD_00_exa01celadm01 2953G


DATAC1_CD_01_exa01celadm01 DATAC1 CD_01_exa01celadm01 2953G
DATAC1_CD_02_exa01celadm01 DATAC1 CD_02_exa01celadm01 2953G
DATAC1_CD_03_exa01celadm01 DATAC1 CD_03_exa01celadm01 2953G
DATAC1_CD_04_exa01celadm01 DATAC1 CD_04_exa01celadm01 2953G
DATAC1_CD_05_exa01celadm01 DATAC1 CD_05_exa01celadm01 2953G
DBFS_DG_CD_02_exa01celadm01 DBFS_DG CD_02_exa01celadm01 33.796875G
DBFS_DG_CD_03_exa01celadm01 DBFS_DG CD_03_exa01celadm01 33.796875G
DBFS_DG_CD_04_exa01celadm01 DBFS_DG CD_04_exa01celadm01 33.796875G
DBFS_DG_CD_05_exa01celadm01 DBFS_DG CD_05_exa01celadm01 33.796875G
RECOC1_CD_00_exa01celadm01 RECOC1 CD_00_exa01celadm01 738.4375G
RECOC1_CD_01_exa01celadm01 RECOC1 CD_01_exa01celadm01 738.4375G
RECOC1_CD_02_exa01celadm01 RECOC1 CD_02_exa01celadm01 738.4375G
RECOC1_CD_03_exa01celadm01 RECOC1 CD_03_exa01celadm01 738.4375G
RECOC1_CD_04_exa01celadm01 RECOC1 CD_04_exa01celadm01 738.4375G
RECOC1_CD_05_exa01celadm01 RECOC1 CD_05_exa01celadm01 738.4375G

INTERLEAVED GRIDDISKS:
By default, space for grid disks is allocated from the outer tracks to the inner tracks of a physical disk. So
the first grid disk created on each cell disk uses the outermost portion of the disk, where each track
contains more data, resulting in higher transfer rates and better performance.

However, space for grid disks can be allocated in an interleaved manner. Grid disks that use this type of
space allocation are referred to as interleaved grid disks. This method effectively equalizes the
performance of multiple grid disks residing on the same physical disk.

51
Interleaved grid disks work in conjunction with ASM intelligent data placement (IDP) to ensure that
primary ASM extents are placed in the faster upper portion of each grid disk, while secondary (mirror)
extents are placed on the slower lower portion of each grid disk. IDP is automatically enabled when the
disk group REDUNDANCY setting is compatible with the INTERLEAVING setting for the underlying cell
disk.

To automatically leverage IDP on a NORMAL redundancy disk group, the underlying cell disks must have
the attribute setting INTERLEAVING='normal_redundancy'. In this case, all the primary extents are
placed in the outer half (upper portion) of the disk, and all the mirror extents are place in the inner half
(lower portion).

To automatically leverage IDP on a HIGH redundancy disk group, the underlying cell disks must have the
attribute setting INTERLEAVING='high_redundancy'. In this case, all the primary extents are placed in the
outer third (upper portion) of the disk, and all the mirror extents are place in the inner two-thirds (lower
portion).

ASM will not allow incompatibility between the disk group REDUNDANCY setting and the INTERLEAVING
setting for the underlying cell disks. For example, a NORMAL redundancy disk group cannot be created
over cell disks with INTERLEAVING='high_redundancy'. ASM will not permit the creation of such a disk
group, nor will it allow disks to be added to an already existing disk group if that would result in an
incompatibility.

CREATING INTERLEAVING GRIDDISKS:


CellCLI> CREATE CELLDISK ALL HARDDISK INTERLEAVING='normal_redundancy'
CellCLI> CREATE GRIDDISK ALL PREFIX=DATAC1, SIZE=2953G
CellCLI> CREATE GRIDDISK ALL PREFIX=RECOC1, SIZE=738G
CellCLI> CREATE GRIDDISK ALL PREFIX=DBFS

52
CellCLI> list griddisk attributes asmDiskName,diskType,offset,size

DATAC1_CD_00_EXA01CELADM01 HardDisk 32M 2953G


DATAC1_CD_01_EXA01CELADM01 HardDisk 32M 2953G
DATAC1_CD_02_EXA01CELADM01 HardDisk 32M 2953G
DATAC1_CD_03_EXA01CELADM01 HardDisk 32M 2953G
DATAC1_CD_04_EXA01CELADM01 HardDisk 32M 2953G
DATAC1_CD_05_EXA01CELADM01 HardDisk 32M 2953G
RECOC1_CD_00_EXA01CELADM01 HardDisk 2953.046875G 738.4375G
RECOC1_CD_01_EXA01CELADM01 HardDisk 2953.046875G 738.4375G
RECOC1_CD_02_EXA01CELADM01 HardDisk 2953.046875G 738.4375G
RECOC1_CD_03_EXA01CELADM01 HardDisk 2953.046875G 738.4375G
RECOC1_CD_04_EXA01CELADM01 HardDisk 2953.046875G 738.4375G
RECOC1_CD_05_EXA01CELADM01 HardDisk 2953.046875G 738.4375G
DBFS_DG_CD_02_EXA01CELADM01 HardDisk 3691.484375G 33.796875G
DBFS_DG_CD_03_EXA01CELADM01 HardDisk 3691.484375G 33.796875G
DBFS_DG_CD_04_EXA01CELADM01 HardDisk 3691.484375G 33.796875G
DBFS_DG_CD_05_EXA01CELADM01 HardDisk 3691.484375G 33.796875G

DISK_REPAIR_TIME:
If a grid disk remains offline longer than the time specified by the disk_repair_time attribute, then
Oracle ASM force drops that grid disk and starts a rebalance to restore data redundancy.

The Oracle ASM disk repair timer represents the amount of time a disk can remain offline before it is
dropped by Oracle ASM. While the disk is offline, Oracle ASM tracks the changed extents so the disk can
be resynchronized when it comes back online. The default disk repair time is 3.6 hours. If the default is
inadequate, then the attribute value can be changed to the maximum amount of time it might take to
detect and repair a temporary disk failure. The following command is an example of changing the disk
repair timer value to 8.5 hours for the DATA disk group:

ALTER DISKGROUP data SET ATTRIBUTE 'disk_repair_time' = '8.5h'

The disk_repair_time attribute does not change the repair timer for disks currently offline. The repair
timer for those offline disks is either the default repair timer or the repair timer specified on the
command line when the disks were manually set to offline. To change the repair timer for currently
offline disks, use the OFFLINE command and specify a repair timer value. The following command is an
example of changing the disk repair timer value for disks that are offline:

ALTER DISKGROUP data OFFLINE DISK data_CD_06_cell11 DROP AFTER 20h;

Note: When the disk repair time value is increased, the vulnerability of a double failure is increased.

53
CELLCLI:

The storage cells in Exadata Database Machine are managed via two tools called CELL Command Line
Interface (CellCLI) and Dynamic Command Line Interface (DCLI). The CELLCLI command is invoked from
the Linux command line in the storage cells, not in the compute nodes. CellCLI show the CellCLI> prompt
where you will enter the commands.

The CellCLI commands have the following general structure:


<Verb> <Object> <Modifier> <Filter>

A verb is what you want or the action, e.g. display something.


An object is what you want the action on, e.g. a disk.
A modifier (optional) shows how you want to operation to be modified, e.g. all disks, or a specific disk.
A filter (optional) is similar to the WHERE predicate of a SQL statement, used with WHERE clause.

There are only a few primary verbs you will use mostly and need to remember. They are:

LIST – to show something, e.g. disks, statistics, Resource Manager Plans, etc.
CREATE – to create something, e.g. a cell disk, a threshold
ALTER – to change something that has been established, e.g. change the size of a disk
DROP – to delete something, e.g. a dropping a disk
DESCRIBE – to display the various attributes of an object

HOW TO INVOKE CELLCLI:

[root@exa01celadm01 ~]# cellcli –e <command>

OR

[root@exa01celadm01 ~]# cellcli


CellCLI: Release 12.1.1.1.1 - Production on Mon Oct 05 09:43:54 EDT 2015

Copyright (c) 2007, 2013, Oracle. All rights reserved.


Cell Efficiency Ratio: 5,134

CellCLI> help list

Enter HELP LIST <object_type> for specific help syntax.


<object_type>: {ACTIVEREQUEST | ALERTHISTORY | ALERTDEFINITION | CELL
| CELLDISK | DATABASE | FLASHCACHE | FLASHLOG | FLASHCACHECONTENT
| GRIDDISK | IBPORT | IORMPLAN | KEY | LUN
| METRICCURRENT | METRICDEFINITION | METRICHISTORY
| PHYSICALDISK | QUARANTINE | THRESHOLD}

54
CellCLI> help list CELLDISK

Usage: LIST CELLDISK [<name> | <filters>] [<attribute_list>] [DETAIL]


Purpose: Displays specified attributes for cell disks.
Arguments:
<name>: The name of the cell disk to be displayed.
<filters>: an expression which determines which cell disks should
be displayed.
<attribute_list>: The attributes that are to be displayed.
ATTRIBUTES {ALL | attr1 [, attr2]... }
Options:
[DETAIL]: Formats the display as an attribute on each line, with
an attribute descriptor preceding each value.
Examples:
LIST CELLDISK cd1 DETAIL
LIST CELLDISK where freespace > 100M

CellCLI> list CELLDISK


CD_00_exa01celadm01 normal
CD_01_exa01celadm01 normal
CD_02_exa01celadm01 normal
CD_03_exa01celadm01 normal
CD_04_exa01celadm01 normal
CD_05_exa01celadm01 normal
FD_00_exa01celadm01 normal
FD_01_exa01celadm01 normal
FD_02_exa01celadm01 normal
FD_03_exa01celadm01 normal
FD_04_exa01celadm01 normal
FD_05_exa01celadm01 normal
FD_06_exa01celadm01 normal
FD_07_exa01celadm01 normal

CellCLI> list CELLDISK CD_00_exa01celadm01 detail


name: CD_00_exa01celadm01
comment:
creationTime: 2014-07-08T11:13:35-04:00
deviceName: /dev/sda
devicePartition: /dev/sda3
diskType: HardDisk
errorCount: 0
freeSpace: 0
id: xxxxxxxx-xxx-xxxx-xxxx-xxxxxxxxxxxxx
interleaving: none
lun: 0_0
physicalDisk: E0073X

55
raidLevel: 0
size: 3691.484375G
status: normal

CellCLI> list CELLDISK attributes name,size,status


CD_00_exa01celadm01 3691.484375G normal
CD_01_exa01celadm01 3691.484375G normal
CD_02_exa01celadm01 3725.28125G normal
CD_03_exa01celadm01 3725.28125G normal
CD_04_exa01celadm01 3725.28125G normal
CD_05_exa01celadm01 3725.28125G normal
FD_00_exa01celadm01 186.25G normal
FD_01_exa01celadm01 186.25G normal
FD_02_exa01celadm01 186.25G normal
FD_03_exa01celadm01 186.25G normal
FD_04_exa01celadm01 186.25G normal
FD_05_exa01celadm01 186.25G normal
FD_06_exa01celadm01 186.25G normal
FD_07_exa01celadm01 186.25G normal

DISKS ADMINISTRATION:

CREATE DROP ALTER LIST


CELL (EXADATA CELL)    
PHYSICAL DISK    
LUN    
CELLDISK    
GRIDDISK    
FLASH CACHE    
FLASH LOG    

DCLI:
The DCLI utility facilitates centralized management across Database Machine by automating the
execution of a command on a set of servers and returning the output to the centralized management
location where dcli was run.

By default, the dcli utility is located at /opt/oracle/cell/cellsrv/bin/dcli on each Exadata server and at
/usr/local/bin/dcli on each database server. You can also copy the dcli utility to a server outside of
Database Machine and centrally execute commands from that server.

56
The dcli utility allows you to simultaneously execute a command on multiple Database Machine servers.
 Command types:
– Operating system commands
– CellCLI commands
– Operating system scripts
– CellCLI scripts
 Commands are executed in separate parallel threads.
 Interactive sessions are not supported.
 Python 2.3 and SSH user-equivalence are required.
 Command output is collected and displayed in the terminal executing the dcli utility.

DCLI requires prior setup of SSH user-equivalence between all the servers. You can use the dcli utility
initially with the -k option to set up SSH user-equivalence between a group of servers.

The -k option should be used initially to perform key exchange with cells. User may be prompted to
acknowledge cell authenticity, and may be prompted for the remote user password. This -k step is
serialized to prevent overlayed prompts. After -k option is used once, and then subsequent commands
to the same cells do not require -k and will not require passwords for that user from the host.
EXAMPLES:

dcli -g all_group -k
dcli -c -c dm01celadm01,dm01celadm02 –l roor date
dcli -g cell_group –l root cellcli -e list cell
dcli -g dbs_group –l root -x create_user.sh

MOST USED OPTIONS:


-k Keys (push RSA Keys)
-l user
-g group
-c cell list
-x executable

In general group files will be created as txt files at following location:


/opt/oracle.SupportTools/onecommand/
To run the DCLI with –g option, you must run from above location or must specify the full path of the
group file.
[root@exa01dbadm01 ~]# cd /opt/oracle.SupportTools/onecommand/
[root@exa01dbadm01 onecommand]# ls
all_group cell_group dbs_group

[root@exa01dbadm01 onecommand]# cat all_group


exa01dbadm01
exa01dbadm02
exa01celadm01
exa01celadm02
exa01celadm03

57
NOTE: By default DCLI runs as celladmin user. If you run dcli on db Nodes without specifying –l option,
dcli will fail.

[root@exa01dbadm01 onecommand]# dcli -g dbs_group date


celladmin@exa01dbadm02's password: celladmin@exa01dbadm01's password:

Keyboard interrupt
killing child pid 28872...
[root@exa01dbadm01 onecommand]#

[root@exa01dbadm01 onecommand]# dcli -g all_group -l root date


exa01dbadm01: Mon Oct 5 12:00:53 EDT 2015
exa01dbadm02: Mon Oct 5 12:00:53 EDT 2015
exa01celadm01: Mon Oct 5 12:00:53 EDT 2015
exa01celadm02: Mon Oct 5 12:00:53 EDT 2015
exa01celadm03: Mon Oct 5 12:00:53 EDT 2015

Above example shows dcli being used to execute the operating system date command. The –g option
specifies a file (all_group), which contains the list of target servers to which the command (date) is sent.
The servers can be identified by host names (or IP addresses). The servers can be database servers or
Exadata Storage Servers.

Below example uses –c option to specify the target servers (dm01celadm01, dm01celadm02) in the
command line. It invokes CellCLI to report the cell status.

[root@exa01dbadm01 onecommand]# dcli -c exa01celadm01,exa01celadm02 -l root cellcli -e list cell


exa01celadm01: exa01celadm01 online
exa01celadm02: exa01celadm02 online

Next example uses the –x option to specify a command file. The command file must exist on the server
executing the dcli utility. The command file is copied to the target servers and executed. A file with the
.scl extension is run by the CellCLI utility on the target server. A file with a different extension is run by
the operating system shell on the target server. The file is copied to the default home directory of the
user on the target server. Files specified using the –x option must have execute privileges or else dcli will
report an error.

[root@exa01dbadm01 onecommand]# dcli -g dbs_group -l root -x create_user.sh

MAINTAINING DISKS & STATUS:

58
The first two disks of Exadata Storage Server are system disks. Oracle Exadata Storage Server Software
system software resides on a portion of each of the system disks. These portions on both system disks
are referred to as the system area. The non-system area of the system disks, referred to as data
partitions, is used for normal data storage. All other disks in the cell are called data disks.

You can monitor a physical disk by checking its attributes with the CellCLI LIST PHYSICALDISK command.
For example, a physical disk status equal to failed (The status for failed physical disks was critical in
earlier releases.), or warning - predictive failure is probably having problems and needs to be replaced.
The disk firmware maintains the error counters, and marks a drive with Predictive Failure when internal
thresholds are exceeded. The drive, not the cell software, determines if it needs replacement.

When disk I/O errors occur, Oracle ASM performs bad extent repair for read errors due to media errors.
The disks will stay online, and no alerts are sent. When Oracle ASM gets a read error on a physically-
addressed metadata block, it does not have mirroring for the blocks, and takes the disk offline. Oracle
ASM force drops the disk.

HANDLING DISK REPLACEMENTS

HOW ORACLE ASM HANDLES GRID DISKS WHEN PHYSICAL DISK HAS A PROBLEM.

HOW ORACLE EXADATA STORAGE SERVER SOFTWARE HANDLES PHYSICAL DISKS BASED ON
THE STATUS OF THE PHYSICAL DISK.

59
Once a physical disk is replaced, Oracle Exadata Storage Server Software automatically creates the grid
disks on the replacement disk, and adds them to the respective Oracle ASM disk groups. An Oracle ASM
rebalance operation relocates data to the newly-added grid disks. If an error occurs during the rebalance
operation, then an alert is sent.

PHYSICAL DISK STATUS FOR RELEASE 11.2.3.3 AND LATER:

NORMAL
NORMAL - DROPPED FOR REPLACEMENT
NORMAL – CONFINED ONLINE
NORMAL – CONFINED ONLINE - DROPPED FOR REPLACEMENT
NOT PRESENT
FAILED
FAILED - DROPPED FOR REPLACEMENT
FAILED - REJECTED DUE TO INCORRECT DISK MODEL

60
FAILED - REJECTED DUE TO INCORRECT DISK MODEL - DROPPED FOR REPLACEMENT
FAILED - REJECTED DUE TO WRONG SLOT
FAILED - REJECTED DUE TO WRONG SLOT - DROPPED FOR REPLACEMENT
WARNING – CONFINED ONLINE
WARNING – CONFINED ONLINE - DROPPED FOR REPLACEMENT
WARNING - PEER FAILURE
WARNING - POOR PERFORMANCE
WARNING - POOR PERFORMANCE - DROPPED FOR REPLACEMENT
WARNING - POOR PERFORMANCE, WRITE-THROUGH CACHING
WARNING - PREDICTIVE FAILURE, POOR PERFORMANCE
WARNING - PREDICTIVE FAILURE, POOR PERFORMANCE - DROPPED FOR REPLACEMENT
WARNING - PREDICTIVE FAILURE, WRITE-THROUGH CACHING
WARNING - PREDICTIVE FAILURE
WARNING - PREDICTIVE FAILURE - DROPPED FOR REPLACEMENT
WARNING - PREDICTIVE FAILURE, POOR PERFORMANCE, WRITE-THROUGH CACHING
WARNING - WRITE-THROUGH CACHING

HOW TO REBOOT STORAGE SERVER

1 By default, ASM drops a disk shortly after it is taken offline; however, you can set the
DISK_REPAIR_TIME attribute to prevent this operation by specifying a time interval to repair the disk
and bring it back online. The default DISK_REPAIR_TIME attribute value of 3.6h should be adequate
for most environments.

a. To check repair times for all mounted disk groups - log into the ASM instance and perform the
following query:
SQL> select dg.name,a.value from v$asm_diskgroup dg, v$asm_attribute a where
dg.group_number = a.group_number and a.name='disk_repair_time';

b. If you need to offline the ASM disks for more than the default time of 3.6 hours then adjust the
parameter by issuing the command below as an example:
SQL> ALTER DISKGROUP DATA SET ATTRIBUTE 'DISK_REPAIR_TIME'='8.5H';

2 Check if ASM will be OK if the grid disks go OFFLINE. The following command should return 'Yes' for
the grid disks being listed:
cellcli -e list griddisk attributes name,asmmodestatus,asmdeactivationoutcome

3 If one or more disks does not return asmdeactivationoutcome='Yes', you should check the
respective diskgroup and restore the data redundancy for that diskgroup.

NOTE: Shutting down the cell services when one or more grid disks does not return
asmdeactivationoutcome='Yes' will cause Oracle ASM to dismount the affected disk group, causing
the databases to shut down abruptly.
4 Run cellcli command to Inactivate all grid disks on the cell you wish to power down/reboot:

61
cellcli -e alter griddisk all inactive

Please note - This action could take 10 minutes or longer depending on activity. It is very important
to make sure you were able to offline all the disks successfully before shutting down the cell
services. Inactivating the grid disks will automatically OFFLINE the disks in the ASM instance.

5 Confirm that the griddisks are now offline by performing the following actions:

Execute the command below and the output should show either asmmodestatus=OFFLINE or
asmmodestatus=UNUSED and asmdeactivationoutcome=Yes for all griddisks once the disks are
offline in ASM. Only then is it safe to proceed with shutting down or restarting the cell:
cellcli -e list griddisk attributes name,asmmodestatus,asmdeactivationoutcome

List the griddisks to confirm all now show inactive:


cellcli -e list griddisk

6 You can now reboot the cell. Oracle Exadata Storage Servers are powered off and rebooted using
the Linux shutdown command.
a. The following command will shut down Oracle Exadata Storage Server immediately: (as root):
#shutdown -h now
(When powering off Exadata Storage Servers, all storage services are automatically stopped.)
a. The following command will reboot Oracle Exadata Storage Server immediately and force fsck
on reboot:
#shutdown -F -r now

7 Once the cell comes back online - you will need to reactive the griddisks:
cellcli -e alter griddisk all active

8 Issue the command below and all disks should show 'active':
cellcli -e list griddisk

9 Verify grid disk status:


a. Verify all grid disks have been successfully put online using the following command:
cellcli -e list griddisk attributes name, asmmodestatus

b. Wait until asmmodestatus is ONLINE for all grid disks. Each disk will go to a 'SYNCING' state first
then 'ONLINE'. The following is an example of the output:

DATA_CD_00_exa01cel01 ONLINE
DATA_CD_01_exa01cel01 SYNCING
DATA_CD_02_exa01cel01 OFFLINE
DATA_CD_03_exa01cel01 OFFLINE
DATA_CD_04_exa01cel01 OFFLINE
DATA_CD_05_exa01cel01 OFFLINE
DATA_CD_06_exa01cel01 OFFLINE
DATA_CD_07_exa01cel01 OFFLINE

62
DATA_CD_08_exa01cel01 OFFLINE
DATA_CD_09_exa01cel01 OFFLINE
DATA_CD_10_exa01cel01 OFFLINE
DATA_CD_11_exa01cel01 OFFLINE

c. Oracle ASM synchronization is only complete when all grid disks show asmmodestatus=ONLINE.

NOTE: This operation uses Fast Mirror Resync operation - which does not trigger an ASM rebalance.
The Resync operation restores only the extents that would have been written while the disk was
offline

10 Before taking another storage server offline, Oracle ASM synchronization must complete on the
restarted Oracle Exadata Storage Server. If synchronization is not complete, then the check
performed on another storage server will fail. The following is an example of the output:

CellCLI> list griddisk attributes name where asmdeactivationoutcome != 'Yes'


DATA_CD_00_exa01cel02 "Cannot de-activate due to other offline disks in the diskgroup"
DATA_CD_01_exa01cel02 "Cannot de-activate due to other offline disks in the diskgroup"
DATA_CD_02_exa01cel02 "Cannot de-activate due to other offline disks in the diskgroup"
DATA_CD_03_exa01cel02 "Cannot de-activate due to other offline disks in the diskgroup"
DATA_CD_04_exa01cel02 "Cannot de-activate due to other offline disks in the diskgroup"
DATA_CD_05_exa01cel02 "Cannot de-activate due to other offline disks in the diskgroup"
DATA_CD_06_exa01cel02 "Cannot de-activate due to other offline disks in the diskgroup"
DATA_CD_07_exa01cel02 "Cannot de-activate due to other offline disks in the diskgroup"
DATA_CD_08_exa01cel02 "Cannot de-activate due to other offline disks in the diskgroup"
DATA_CD_09_exa01cel02 "Cannot de-activate due to other offline disks in the diskgroup"
DATA_CD_10_exa01cel02 "Cannot de-activate due to other offline disks in the diskgroup"
DATA_CD_11_exa01cel02 "Cannot de-activate due to other offline disks in the diskgroup"

HOW TO REPLACE A PHYSICAL DISK:

Determining when Disks should be replaced on Oracle Exadata Database Machine (Doc ID 1452325.1)
How to Replace a Hard Drive in an Exadata Storage Server (Hard Failure) (Doc ID 1386147.1)
How to Replace a Hard Drive in an Exadata Storage Server (Predictive Failure) (Doc ID 1390836.1)

(Storage Server to be continued…….)

FLASH DISKS, FLASH CACHE & FLASH LOG

63
Each Storage Server (X4-2) has 4 PCI FLASH CARDS and each FLASH CARD has 4 FLASH DISKS with a total
of 3Tb Flash Memory. Each FLASH DISK is 186 GB.

CellCLI> list physicaldisk attributes name, diskType, physicalSize where diskType=FlashDisk

FLASH_1_0 FlashDisk 186.26451539993286G


FLASH_1_1 FlashDisk 186.26451539993286G
FLASH_1_2 FlashDisk 186.26451539993286G
FLASH_1_3 FlashDisk 186.26451539993286G
FLASH_2_0 FlashDisk 186.26451539993286G
FLASH_2_1 FlashDisk 186.26451539993286G
FLASH_2_2 FlashDisk 186.26451539993286G
FLASH_2_3 FlashDisk 186.26451539993286G
FLASH_4_0 FlashDisk 186.26451539993286G
FLASH_4_1 FlashDisk 186.26451539993286G
FLASH_4_2 FlashDisk 186.26451539993286G
FLASH_4_3 FlashDisk 186.26451539993286G
FLASH_5_0 FlashDisk 186.26451539993286G
FLASH_5_1 FlashDisk 186.26451539993286G
FLASH_5_2 FlashDisk 186.26451539993286G
FLASH_5_3 FlashDisk 186.26451539993286G

The Exadata Smart Flash Cache automatically caches frequently accessed data in PCI flash while keeping
infrequently accessed data on disk drives. This provides the performance of flash with the capacity and
low cost of disk. The Exadata Smart Flash Cache understands database workloads and knows when to
avoid caching data that the database will rarely access or is too big to fit in the cache.

64
Exadata Storage Server provides Exadata Smart Flash Cache, a caching mechanism for frequently
accessed data which is useful for absorbing repeated random reads, and very beneficial to OLTP. Using
Exadata Smart Flash Cache, a single Exadata cell can support up to 100,000 IOPS.

CellCLI> list flashcache detail

name: exa01celadm01_FLASHCACHE
cellDisk: FD_01_exa01celadm01, FD_06_exa01celadm01, FD_02_exa01celadm01,
FD_00_exa01celadm01, FD_04_exa01celadm01, FD_07_exa01celadm01,
FD_05_exa01celadm01, FD_03_exa01celadm01
creationTime: 2015-03-26T21:49:42-04:00
degradedCelldisks:
effectiveCacheSize: 1489.125G
id: xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxxx
size: 1489.125G
status: normal

65
EXADATA SMART FLASH CACHE INTELLIGENT CACHING:
Exadata Smart Flash Cache understands different types of database I/O, Smart enough to decide what to
cache and what not to.

WHAT DOES?
 Frequently accessed data and index blocks are cached.
 Control file reads and writes are cached.
 File header reads and writes are cached.
 DBA can influence caching priorities.

WHAT DOESN’T?
 I/Os to mirror copies are not cached.
 Backup-related I/O is not cached.
 Data Pump I/O is not cached.
 Data file formatting is not cached.
 Full Table scans are not cached.

EXADATA SMART FLASH CACHE MECHANISM:


Exadata Smart Flash Cache provides a caching mechanism for frequently accessed data on each Exadata
cell. Exadata Smart Flash Cache automatically works in conjunction with Oracle Database to intelligently
optimize the efficiency of the cache.

66
FIRST READ OPERATION:
1. CELLSRV gets Read request from Db Server.
2. CELLSRV checks the in-memory hash table to determine whether the data blocks reside in
Exadata Smart Flash Cache or not. Being the request comes First time for those specific blocks,
gets the blocks from the disk.
3. CELLSRV sends the blocks to the Db Server.
4. Data blocks will be placed in Exadata Smart Flash Cache, based on Caching eligibility.

SUBSEQUENT READ OPERATION:


1. CELLSRV gets Read request from Db Server.
2. CELLSRV checks the in-memory hash table to determine whether the data blocks reside in
Exadata Smart Flash Cache or not. Being the subsequent request, the data blocks will be
retrieved from the Exadata Smart Flash Cache.
3. CELLSRV sends the blocks to the Db Server.

WRITE OPERATION (WRITE THROUGH FLASH CACHE):


1. CELLSRV gets the Write request from Db server.
2. CELLSRV writes data to disk.
3. CELLSRV sends acknowledgement to the database so it can continue without any interruption.
4. If the data is eligible for caching, it is written to Exadata Smart Flash Cache.

WRITE BACK FLASH CACHE:


Write back flash cache provides the ability to cache write I/Os directly to PCI flash in addition to read
I/Os. If any application is write intensive and if one finds significant waits for "free buffer waits" or high
IO times to check for write bottleneck in AWR reports, then write back flash cache is the best suitable
option.

NOTE: Exadata storage software version 11.2.3.2.1 is the minimum version required allowing writes to
go into Smart Flash cache.

WRITE-BACK FLASH CACHE BENEFITS:

 It improves the write intensive operations because writing to flash cache is much faster than
writing to Hard disks.
 Write performance can be improved up to 20X IOPS than Hard disk.
 Write-Back Flash Cache transparently accelerates reads and writes for all workloads for OLTP
(faster random reads and writes) and DW (faster sequential smart scans).
 Write-Back Flash Cache reduces latency of redo log writes when it shares disks with data.
 Data recoverable from Flash cache on cellsrv restart
 Cell Attribute “flashCacheMode” determines this mode, the possible values are “WriteThrough”
and “WriteBack”.

67
CELLCLI> list cell attributes flashCacheMode

WRITE OPERATION (WRITE BACK FLASH CACHE):


1. CELLSRV gets the Write request from Db server.
2. CELLSRV writes data to Exadata Smart Flash Cache.
3. CELLSRV sends acknowledgement to the database so it can continue without any interruption.
4. The blocks will be written to DISK.

HOW TO ENABLE WRITE-BACK FLASH CACHE:


1. Rolling Method
2. Non-Rolling Method

Note: Before enabling Write back flash cache, perform the following check as root from one of the
compute nodes:

Check all griddisk “asmdeactivationoutcome” and “asmmodestatus” to ensure that all griddisks on all
cells are “Yes” and “ONLINE” respectively.

68
# dcli -g cell_group -l root cellcli -e list griddisk attributes asmdeactivationoutcome, asmmodestatus

Check that all of the flashcache are in the “normal” state and that no flash disks are in a degraded or
critical state:

# dcli -g cell_group -l root cellcli -e list flashcache detail

ROLLING METHOD:
(Assuming that RDBMS & ASM instances are UP and enabling Write-Back Flash Cache in One Cell Server
at a time)

Login to Cell Server:

Step 1. Drop the flash cache on that cell

#cellcli –e drop flashcache

Step 2. Check the status of ASM if the grid disks go OFFLINE. The following command should return 'Yes'
for the grid disks being listed:

# cellcli -e list griddisk attributes name,asmmodestatus,asmdeactivationoutcome

Step 3. Inactivate the griddisk on the cell

# cellcli –e alter griddisk all inactive

Step 4. Shut down cellsrv service

# cellcli -e alter cell shutdown services cellsrv

Step 5. Set the cell flashcache mode to writeback

# cellcli -e "alter cell flashCacheMode=writeback"

Step 6. Restart the cellsrv service

# cellcli -e alter cell startup services cellsrv

Step 7. Reactivate the griddisks on the cell

# cellcli –e alter griddisk all active

Step 8. Verify all grid disks have been successfully put online using the following command:

# cellcli -e list griddisk attributes name, asmmodestatus

69
Step 9. Recreate the flash cache

# cellcli -e create flashcache all

Step 10. Check the status of the cell to confirm that it's now in WriteBack mode:

# cellcli -e list cell detail | grep flashCacheMode

Step 11. Repeat these same steps again on the next cell to the FINAL cell. However, before taking
another storage server offline, execute the following making sure 'asmdeactivationoutcome' displays
YES:

# cellcli -e list griddisk attributes name,asmmodestatus,asmdeactivationoutcome

NON-ROLLING METHOD:
(Assuming that RDBMS & ASM instances are DOWN while enabling Write-Back Flash Cache)

Step 1. Drop the flash cache on that cell

# dcli -g cell_group -l root cellcli -e drop flashcache

Step 2. Shut down cellsrv service

# dcli -g cell_group -l root cellcli -e alter cell shutdown services cellsrv

Step 3. Set the cell flashCacheMode to WriteBack

# dcli -g cell_group -l root cellcli -e "alter cell flashCacheMode=writeback"

Step 4. Restart the cellsrv service

# dcli -g cell_group -l root cellcli -e alter cell startup services cellsrv

Step 5. Recreate the flash cache

# dcli -g cell_group -l root cellcli -e create flashcache all

CACHING PRIORITIZATION:
DEFAULT: Specifies that Exadata Smart Flash Cache is used normally
KEEP: Specifies that Exadata Smart Flash Cache is used more aggressively.
NONE: Specifies that Exadata Smart Flash Cache is not used.

70
DEFAULT: When suitable data is inserted into a Flash Cache, a prioritized least recently used (LRU)
algorithm determines which data to replace.

KEEP: KEEP objects have priority over DEFAULT objects so that new data from a DEFAULT object will
not push out cached data from any KEEP objects. To prevent KEEP objects from monopolizing the cache,
they are allowed to occupy no more than 80% of the total cache size.
To prevent unused KEEP objects from indefinitely occupying the cache, they are subject to an additional
aging policy, which periodically purges unused KEEP object data.

KEEP blocks are automatically 'un-pinned' if


 Object is dropped, shrunk, or truncated
 Object is not accessed on the cell within 48 Hours
 Block is not accessed on the cell within 24 Hours
 Downgraded to 'DFAULT'
 Changing priority from KEEP to NONE marks blocks in cache to DEFAULT

CELL_FLASH_CAHE parameter is used to determine the caching priorities, what’s need to be cached and
what’s not be cached. CELL_FLASH_CAHE parameter can be set at:
 TABLE level
 INDEX level
 PARTITION level
 LOB column

CREATE TABLE EMP (empno number) STORAGE (CELL_FLASH_CACHE KEEP);

ALTER TABLE DEPT STORAGE (CELL_FLASH_CACHE NONE);

ALTER INDEX EMP_IDX STORAGE (CELL_FLASH_CACHE NONE);

CREATE TABLE EMP (empno NUMBER PRIMARY KEY, ename VARCHAR2(30), deptno NUMBER)
PARTITION BY LIST (deptno) (PARTITION p10 VALUES (10) STORAGE (CELL_FLASH_CACHE DEFAULT),
PARTITION p20 VALUES (20) STORAGE (CELL_FLASH_CACHE KEEP),
PARTITION p30 VALUES (30,40) STORAGE (CELL_FLASH_CACHE NONE));

HOW TO RECREATE FLACH CACHE:


# cellcli -e drop flashcache
# cellcli -e drop flashlog
# cellcli -e drop celldisk all flashdisk
# cellcli -e create celldisk all flashdisk
# cellcli -e create flashlog all
# cellcli -e create flashcache all

71
MONITORING FLASH CACHE USAGE:

METRICS:

FC_IO_ERRS "Number of IO errors on FlashCache"


FC_BY_DIRTY "Number of unflushed megabytes in FlashCache"
FC_BY_USED "Number of megabytes used on FlashCache"
FC_IO_BY_R "Number of megabytes read from FlashCache"
FC_IO_BY_W "Number of megabytes written to FlashCache"
FC_IO_BY_R_MISS "Number of megabytes read from disks because not all requested data was
in FlashCache"

[root@exa01dbadm01 onecommand]# dcli -g cell_group -l root cellcli -e "list metriccurrent fc_by_used"


exa01celadm01: FC_BY_USED FLASHCACHE 4,041 MB
exa01celadm02: FC_BY_USED FLASHCACHE 3,994 MB
exa01celadm03: FC_BY_USED FLASHCACHE 4,030 MB

[root@exa01dbadm01 onecommand]# dcli -g cell_group -l root cellcli -e "list flashcache detail" | grep
size
exa01celadm01: size: 1489.125G
exa01celadm02: size: 1489.125G
exa01celadm03: size: 1489.125G

EXADATA SMART FLASH CACHE COMPRESSION:


Hardware Requirements: Exadata Storage Server X3-2, X4-2
Software Requirements: Oracle Exadata Storage Server Software release 11.2.3.3 or higher

Exadata automatically compresses all data in Smart Flash Cache:

 Compression engine built into flash card


 Zero performance overhead on reads and writes
 Logical size of flash cache increases upto 2x
 User gets large amount of data in flash for same media size
 For X4-2 Enabled via “cellcli –e ALTER CELL FLASHCOMPRESSION=TRUE”
 For X3-2 Enabled via “cellcli –e ALTER CELL flashCacheCompX3Support=true”

With flash cache compression, each flash device exposes a logical address space that is two times the
size of the flash memory. User data is mapped to the logical address space and a compressed
representation is stored in flash memory.

72
As is the case with all forms of compression, different data compresses differently using flash cache
compression. The Flash cache compression can be used effectively in conjunction with data that is
already compressed using OLTP compression inside the database. However, also note that flash cache
compression achieves little incremental compression with tables using Exadata Hybrid Columnar
Compression (HCC).

Because data compresses at different rates, the flash memory could fill up before the logical address
space is exhausted. For example, this would likely occur if most of the data in the cache was already
compressed using HCC. To mitigate this, flash cache compression periodically monitors the flash
memory free space (every 3 seconds) and automatically trims the cache as free space becomes scarce.
Trimming removes the least recently used data from the cache in order to free up space.

Flash cell disks must be dropped to change the flashCacheCompress cell attribute

CellCLI> alter flashcache all flush


CellCLI> drop flashcache all
CellCLI> drop flashlog all
CellCLI> drop celldisk all flashdisk
CellCLI> alter cell flashCacheCompress = TRUE
CellCLI> create celldisk all flashdisk
CellCLI> create flashlog all
CellCLI> create flashcache all

73
Monitoring Flash Cache Compression:
Verify that flash cache and flash disks are approximately double in size after flash cache compression is
enabled:
CellCLI> LIST CELL ATTRIBUTES flashCacheCompress
TRUE
CellCLI> LIST FLASHCACHE ATTRIBUTES name, size
qr01cel01_FLASHCACHE 5959G
CellCLI> LIST PHYSICALDISK ATTRIBUTES name, physicalSize WHERE diskType=flashdisk
FLASH_1_0 372.52903032302856G
FLASH_1_1 372.52903032302856G
FLASH_1_2 372.52903032302856G

Monitored through Metric: FC_BY_DIRTY.


CellCLI> LIST METRICCURRENT FC_BY_DIRTY

SMART FLASH LOG:


Each Exadata Storage server is created with 512Mb of Smart Flash log based on the Flash Disks. This is
used for logging the REDOs. Exadata Smart Flash Log provides a high-performance, low-latency, reliable
temporary store for redo log writes:

 Redo writes are directed to disk and Exadata Smart Flash Log.
 Complete the write operation when the first of the two completes.
 Conceptually similar to multiplexed redo logs.
 Most beneficial during busy periods when the disk controller cache occasionally becomes filled
with blocks that have not been written to disk.

74
 Smart Flash Logging improves latency of log write operations, but it does not improve total disk
throughput.
 Exadata Storage Server automatically manages Smart Flash Log and ensures all log entries are
persisted to disk.
 Improves user transaction response time, and increases overall database throughput for I/O
intensive workloads.

CellCLI> list flashlog detail

name: exa01celadm01_FLASHLOG
cellDisk: FD_02_exa01celadm01, FD_00_exa01celadm01, FD_07_exa01celadm01,
FD_06_exa01celadm01, FD_05_exa01celadm01, FD_04_exa01celadm01,
FD_03_exa01celadm01, FD_01_exa01celadm01
creationTime: 2014-07-08T11:15:46-04:00
degradedCelldisks:
effectiveSize: 512M
efficiency: 100.0
id: xxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxx
size: 512M
status: normal

FLASH LOG EFFICIENCY:


SQL> select * from v$event_histogram where event='log file parallel write';
EVENT# EVENT WAIT_TIME_MILLI WAIT_COUNT LAST_UPDATE_TIME
---------- -------------------------------------------------- --------------- ---------- ---------------------------------------------------
135 log file parallel write 1 2088849 06-AUG-15 04.10.48.774393 PM -04:00
135 log file parallel write 2 33351 06-AUG-15 04.00.29.442670 PM -04:00
135 log file parallel write 4 4636 06-AUG-15 01.00.03.712550 PM -04:00
135 log file parallel write 8 1761 06-AUG-15 03.56.09.513719 PM -04:00
135 log file parallel write 16 894 03-AUG-15 11.44.13.171241 PM -04:00
135 log file parallel write 32 31 05-AUG-15 09.54.31.656448 PM -04:00
135 log file parallel write 64 2 30-JUL-15 12.24.48.991839 PM -04:00
135 log file parallel write 128 2 29-JUN-15 01.00.25.084465 PM -04:00

CellCLI> list flashlog detail


name: exa01celadm01_FLASHLOG
cellDisk: FD_02_exa01celadm01, FD_00_exa01celadm01, FD_07_exa01celadm01,
FD_06_exa01celadm01, FD_05_exa01celadm01, FD_04_exa01celadm01,
FD_03_exa01celadm01, FD_01_exa01celadm01
creationTime: 2014-07-08T11:15:46-04:00
degradedCelldisks:
effectiveSize: 512M efficiency: 100.0
size: 512M status: normal

75
CellCLI> list metriccurrent where name like 'FL_.*'
FL_ACTUAL_OUTLIERS FLASHLOG 0 IO requests
FL_BY_KEEP FLASHLOG 0
FL_DISK_FIRST FLASHLOG 49,684,757 IO requests
FL_DISK_IO_ERRS FLASHLOG 0 IO requests
FL_EFFICIENCY_PERCENTAGE FLASHLOG 100 %
FL_EFFICIENCY_PERCENTAGE_HOUR FLASHLOG 100 %
FL_FLASH_FIRST FLASHLOG 39,689,458 IO requests
FL_FLASH_IO_ERRS FLASHLOG 0 IO requests
FL_FLASH_ONLY_OUTLIERS FLASHLOG 0 IO requests
FL_IO_DB_BY_W FLASHLOG 875,222 MB
FL_IO_DB_BY_W_SEC FLASHLOG 0.025 MB/sec
FL_IO_FL_BY_W FLASHLOG 1,123,418 MB
FL_IO_FL_BY_W_SEC FLASHLOG 0.058 MB/sec
FL_IO_W FLASHLOG 89,374,215 IO requests
FL_IO_W_SKIP_BUSY FLASHLOG 0 IO requests
FL_IO_W_SKIP_BUSY_MIN FLASHLOG 0.0 IO/sec
FL_IO_W_SKIP_LARGE FLASHLOG 0 IO requests
FL_IO_W_SKIP_NO_BUFFER FLASHLOG 0 IO requests
FL_PREVENTED_OUTLIERS FLASHLOG 1,736 IO requests

OUTLIERS is writing to ONLINE REDOLOG FILE

FLASH BASED GRID DISKS:


Oracle Exadata Storage Servers are equipped with flash disks. These flash disks can be used to create
flash grid disks to store frequently accessed data. Alternatively, all or part of the flash disk space can be
dedicated to Exadata Smart Flash Cache. In this case, the most frequently-accessed data is cached in
Exadata Smart Flash Cache.

By default, the CREATE CELL command creates flash cell disks on all flash disks, and then creates Exadata
Smart Flash Cache on the flash cell disks. To change the size of the Exadata Smart Flash Cache or create
flash grid disks it is necessary to remove the flash cache, and then create the flash cache with a different
size, or create the flash grid disks.

The amount of flash cache allocated can be set using the flash cache attribute with the CREATE CELL
command. If the flash cache attribute is not specified, then all available flash space is allocated for flash
cache.

Flash cache can also be created explicitly using the CREATE FLASHCACHE command. This command uses
the cell disk attribute to specify which flash cell disks contain cache. Alternatively, ALL can be specified
to indicate that all flash cell disks are used. The size attribute is used to specify the total size of the flash
cache to be allocated, and the allocation is evenly distributed across all flash cell disks.

76
HOW TO CREATE FLASH BASED GRID DISKS:
Step 1: Login to Exadata compute node 1 as root user and execute the dcli command to get the current
flash cache details.

dcli -g cell_group -l root cellcli -e list flashcache attributes name,size,celldisk

Step 2: Flush the dirty blocks from Flash Cache to Grid disks.

[root@exa01dbadm01 ~]# dcli -g cell_group -l root cellcli -e alter flashcache all flush

Step 3: Drop the Flash cache and Flash Cache log

[root@exa01dbadm01 ~]# dcli -g cell_group -l root cellcli -e drop flashcache all


[root@exa01dbadm01 ~]# dcli -g cell_group -l root cellcli -e drop flashlog all
Step 4: Create Flash Cache log, Flash Cache.

[root@exa01dbadm01 ~]# dcli -g cell_group -l root cellcli -e create flashlog all


[root@exa01dbadm01 ~]# dcli -g cell_group -l root cellcli -e create flashcache all size=1024G

Step 5: Create Flash based grid disks on the remaining free space avaialble.

[root@exa01dbadm01 ~]# dcli -g cell_group -l root cellcli -e create griddisk all flashdisk prefix='FLASHC'

Step 6: Verify the Flash based grid disks status

[root@exa01dbadm01 ~]# dcli -g cell_group -l root cellcli -e list griddisk where disktype=flashdisk

Each grid disk will have a size of around 186GB. Login to the storage cell as root user and determine the
size of a grid disk using the following command.

[root@exa01celadm01 ~]# cellcli -e list griddisk where name=FLASH_FD_00_exa01celadm01 detail

Step 7: Now let's create an ASM Disk Group using the flash based grid disks just create above.
Connect to ASM instance with sysasm privilege and execute the following command.

77
SQL> create diskgroup FLASHC1 normal redundancy disk 'o/*/FLASH*'
attribute 'compatible.rdbms' = '11.2.0.4.0', 'compatible.asm' = '11.2.0.4.0',
'cell.smart_scan_capable' = 'TRUE', 'au_size' = '4M';

Diskgroup created

Step 8: Validate the ASM disk group creation Using SQL*PLUS by connecting to ASM instance

SQL> select name,type,total_mb,free_mb from v$asm_diskgroup where name ='FLASHC1';

NAME TYPE TOTAL_MB FREE_MB


------------------------------ ------ ---------- ----------
FLASHC1 NORMAL 9153024 9152424

ASMCMD [+] > lsdg

State Type Rebal Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Voting_files Name

MOUNTED NORMAL N 512 4096 4194304 31924224 29869768 10641408 9614180 0 N DATAC1/
MOUNTED NORMAL N 512 4096 4194304 1038240 1036888 346080 345404 0 Y DBFS_DG/
MOUNTED NORMAL N 512 4096 4194304 9153024 9152424 3051008 3050708 0 N FLASHC1/
MOUNTED NORMAL N 512 4096 4194304 7992000 7812252 2664000 2574126 0 N RECOC1/

CREATING DISKGROUPS:
There are a number of ASM disk group attributes that you can set when creating your disk groups, but
the following are the most important:
 au_size: Set this to 4 MB.
 compatible.asm: Set this to the software version of your Grid Infrastructure home.
 compatible.rdms: Set this to the software version of your RDBMS home.
 cell.smart_scan_capable: Set this to TRUE. If this attribute is set to FALSE, Smart Scan will be
disabled to segments that reside in the disk group.
 disk_repair_time: Leave this defaulted to 3.6 hours unless you’re performing maintenance on a call
and know that your outage window will be greater than 3.6 hours.

EXAMPLE:

SQL> create diskgroup DATA_DG normal redundancy disk 'o/*/SDATA*' attribute 'compatible.rdbms' =
'11.2.0.4.0', 'compatible.asm' = '11.2.0.4.0', 'cell.smart_scan_capable' = 'TRUE', 'au_size' = '4M';

MONITORING EXADATA STORAGE SERVERS:


Exadata Storage servers can be monitored using following:
Metrics: provide a measure relating to some aspect of storage server status or performance.
Thresholds: are metric levels, which if crossed automatically generate an alert notification.
Alerts: are automatically generated notifications of system events.

78
EXADATA METRICS AND ALERTS ARCHITECTURE:

METRICS:

CELLSRV periodically records important run-time properties, called metrics, for cell components such as
CPUs, cell disks, grid disks, flash cache, and IORM statistics. These metrics are recorded in memory.
Based on its own metric collection schedule, the Management Server (MS) gets the set of metric data
accumulated by CELLSRV. MS keeps a subset of the metric values in memory, and writes a history to an
internal disk-based repository every hour. This process is conceptually similar to database AWR
snapshots.

The retention period for metric and alert history entries is specified by the metricHistoryDays cell
attribute. You can modify this setting with the CellCLI ALTER CELL command. By default, it is seven days.
You can view the metric value history by using the CellCLI LIST METRICHISTORY command, and you can
view the current metric values by using the LIST METRICCURRENT command.

OBJECT TYPE:

CL_  (cell) CG_  (consumer group)


CD_  (cell disk) CT_  (category)
GD_  (grid disk) N_  (interconnect network)
FC_  (flash cache) PDB_  (Pluggable Database)
DB_  (database) SIO_  (Smart IO)

79
OPERATION:
IO_RQ  (number of requests) _R  (Read)
IO_BY  (number of MB) _W  (Write)
IO_TM  (I/O latency) _SM  (Small)
IO_WT  (I/O wait time) _LG  (Large)
IO_EL  (I/O Eligible) _ERRS_MIN  (Error Count)
_SEC  (per Second) _RQ  (Per Request)
_MEMUT  (Memory Utilization) _FSUT  (File System Utilization)

ALERTS:
In addition to collecting metrics, Exadata Storage Server can generate alerts. Alerts represent events of
importance occurring within the storage cell, typically indicating that storage cell functionality is either
compromised or in danger of failure. An administrator should investigate these alerts, because they
might require corrective or preventive action. MS generates an alert when it discovers a:

 Cell hardware issue


 Cell software or configuration issue
 CELLSRV internal error
 Metric that has exceeded a threshold defined in the cell

You can view previously generated alerts using the LIST ALERTHISTORY command. In addition, you can
configure the cell to automatically send an email and/or SNMP message to a designated set of
administrators.

There are three types of alerts, informational, warning or critical. Alerts are typically propagated to a
monitoring infrastructure, such as Oracle Enterprise Manager, for notification to storage administrators.
Examples of possible alerts that can be used are physical disk failure, disk read/write errors, cell
temperature exceeding recommended value, Oracle Exadata Storage Server Software failure, and
excessive I/O latency. Metrics can be used to signal alerts using warning or critical threshold values.
When the metric value exceeds the threshold value, an alert can be signaled.
Alerts are either stateful or stateless.

Stateful alerts represent observable cell states that can be subsequently retested to detect whether the
state has changed, so that a previously observed alert condition is no longer a problem.
Stateless alerts represent point-in-time events that do not represent a persistent condition; they simply
show that something has occurred.

CellCLI> list alerthistory


3 2015-01-05T01:07:55-05:00 critical "RS-7445 [Serv MS leaking memory] [It will be
restarted] [] [] [] [] [] [] [] [] [] []"
4_1 2015-01-17T03:57:07-05:00 info "The HDD disk controller battery is performing a
learn cycle. Battery Serial Number : 5472 Battery Type : ibbu08 Battery Temperature : 30 C Full
Charge Capacity : 1412 mAh Relative Charge : 99% Ambient Temperature : 24 C"

80
4_2 2015-01-17T05:12:07-05:00 clear "All disk drives are in WriteBack caching mode.
Battery Serial Number : 5472 Battery Type : ibbu08 Battery Temperature : 31 C Full Charge
Capacity : 1411 mAh Relative Charge : 71% Ambient Temperature : 24 C"
5_1 2015-02-13T01:12:26-05:00 critical "Configuration check discovered the following
problems: USB Errors: [ERROR] Cell USB is not fixable. [ERROR] Multiple bootable USB devices found. "

CellCLI> list alerthistory 3 detail


name: 3
alertDescription: "RS-7445 [Serv MS leaking memory] [It will be restarted] [] [] [] [] [] [] [] [] [] []"
alertMessage: "RS-7445 [Serv MS leaking memory] [It will be restarted] [] [] [] [] [] [] [] [] [] []"
alertSequenceID: 3
alertShortName: ADR
alertType: Stateless
beginTime: 2015-01-05T01:07:55-05:00
endTime:
examinedBy:
notificationState: 1
sequenceBeginTime: 2015-01-05T01:07:55-05:00
severity: critical
alertAction: "Errors in file
/opt/oracle/cell/log/diag/asm/cell/exa01celadm03/trace/rstrc_13175_5.trc (incident=9). Please create
an incident package for incident 9 using ADRCI and upload the incident package to Oracle Support. This
can be done as shown below. From a shell session on cell dm01celadm03, enter the following
commands: $ cd /opt/oracle/cell/log $ adrci adrci> set home diag/asm/cell/dm01celadm03 adrci> ips
pack incident 9 in /tmp <<<adrci displays a message including the name of generated zip file>>> Add
this zip file as an attachment to an email message and send the message to Oracle Support. Finally,
remove the generated zip file from /tmp."

CellCLI> alter alerthistory 3 examinedBy="SATYA"


Alert 3 successfully altered

CellCLI> list alerthistory 5_1 detail


name: 5_1
alertDescription: "Configuration check discovered some problems"
alertMessage: "Configuration check discovered the following problems: USB Errors:
[ERROR] Cell USB is not fixable. [ERROR] Multiple bootable USB devices found. "
alertSequenceID: 5
alertShortName: Software
alertType: Stateful
beginTime: 2015-02-13T01:12:26-05:00
endTime: 2015-02-18T01:12:27-05:00
examinedBy:
metricObjectName: checkdev
notificationState: 1
sequenceBeginTime: 2015-02-13T01:12:26-05:00
severity: critical

81
alertAction: "Correct the configuration problems. Then run cellcli command: ALTER CELL
VALIDATE CONFIGURATION Verify that the new configuration is correct."

CellCLI> alter alerthistory 5_1 examinedBy="SATYA"


Alert 5_1 successfully altered

IPMI TOOL:

Intelligent Platform Management Interface (IPMI) is an open, industry-standard interface for the
management of server systems over a number of different types of networks. SNMP and IPMI perform
essentially the same function however there are two main differences:

 IPMI is more focused on server management. IPMI functionality includes field replaceable unit
(FRU) inventory reporting, logging of system events and system recovery (including system
resets, power on and power off).
 IPMI is associated with an architecture that allows administrators to remotely manage a system
in the absence of an operating system or other system management software. The monitored
system may be powered off, but the baseboard management controller (BMC) must be
connected to a power source and the monitoring medium, typically a local area network
connection. The BMC is a specialized microcontroller embedded in the server.

Inside Database Machine, IPMI support is built into Integrated Lights Out Manager (ILOM) on each
database server and Exadata Storage Server.

# ipmitool
# ipmitool -h --- Help
# ipmitool -help
# ipmitool -H dm01celadm01-ilom -U root chassis power on
# ipmitool sel --- To show System Event Log
# ipmitool sel list --- To know the details of the System Event Log
# ipmitool sel list | grep ECC | cut -f1 -d : | sort -u
# ipmitool sensor
# ipmitool sensor list
# ipmitool sensor list | grep degree
# ipmitool sdr | grep -v ok
# ipmitool lan print
# ipmitool chassis status
# ipmitool power status
# ipmitool sunoem cli -- To Print System Event Log
# ipmitool sunoem cli "show /SYS/T_AMB value"
# ipmitool sunoem cli "show /SYS product_serial_number" -- To Print Product Serial Number
# ipmitool sunoem cli "show /SYS/MB/BIOS" -- To Print BIOS information

dcli -g all_group -l root "ipmitool sensor list | grep "degrees"

82
[root@exa01dbadm02 ~]# ipmitool -I lanplus -H exa01dbadm02-ilom.exaware.com -U root -L
OPERATOR sel list last 100
Password:
294 | 08/28/2015 | 21:43:34 | Processor | IERR | Asserted
295 | 08/28/2015 | 21:43:34 | System Firmware Progress | Management controller initialization |
Asserted
296 | 08/28/2015 | 21:43:35 | Memory | Uncorrectable Error | Asserted | OEM Data-2 0x01 OEM
Data-3 0x01
297 | 08/28/2015 | 21:43:39 | Memory | Uncorrectable Error | Asserted | OEM Data-2 0x01 OEM
Data-3 0x01
298 | 08/28/2015 | 21:43:41 | Memory | Uncorrectable Error | Asserted | OEM Data-2 0x01 OEM
Data-3 0x01
299 | 08/28/2015 | 21:43:46 | System Firmware Progress | Memory initialization | Asserted
29a | 08/28/2015 | 21:44:07 | System Firmware Progress | Primary CPU initialization | Asserted
29b | 08/28/2015 | 21:44:08 | System Firmware Progress | Secondary CPU Initialization | Asserted
29c | 08/28/2015 | 21:44:16 | System Firmware Progress | PCI resource configuration | Asserted
29d | 08/28/2015 | 21:44:17 | System Firmware Progress | PCI resource configuration | Asserted
29e | 08/28/2015 | 21:44:17 | System Firmware Progress | PCI resource configuration | Asserted
29f | 08/28/2015 | 21:44:17 | System Firmware Progress | PCI resource configuration | Asserted
2a0 | 08/28/2015 | 21:44:20 | System Firmware Progress | Keyboard test | Asserted
2a1 | 08/28/2015 | 21:44:21 | System Firmware Progress | Video initialization | Asserted
2a2 | 08/28/2015 | 21:44:21 | System Firmware Progress | Option ROM initialization | Asserted
2a3 | 08/28/2015 | 21:44:28 | System Firmware Progress | Keyboard controller initialization | Asserted
2a4 | 08/28/2015 | 21:44:33 | System Firmware Progress | Option ROM initialization | Asserted
2a5 | 08/28/2015 | 21:44:55 | System Firmware Progress | Hard-disk initialization | Asserted
2a6 | 08/28/2015 | 21:44:55 | System Firmware Progress | Option ROM initialization | Asserted
2a7 | 08/28/2015 | 21:45:02 | System Firmware Progress | Option ROM initialization | Asserted
2a8 | 08/28/2015 | 21:45:18 | System Firmware Progress | System boot initiated | Asserted

83
SMART SCAN
WHAT IS SMARTSCAN?

With traditional storage, all the database intelligence resides in the software on the database server. To
illustrate how SQL processing is performed in this architecture:

1. The client issues a SELECT statement with a predicate to filter a table and return only the rows of
interest to the user.
2. The database kernel maps this request to the file and extents containing the table.
3. The database kernel issues the I/O to read all the table blocks.
4. All the blocks for the table being queried are read into memory.
5. SQL processing is conducted against the data blocks searching for the rows that satisfy the
predicate.
6. The required rows are returned to the client.

Using Exadata, database operations are handled differently. Queries that perform table scans can be
processed within Exadata and return only the required subset of data to the database server. Row
filtering, column filtering, some join processing, and other functions can be performed within Exadata.
Exadata uses a special direct-read mechanism for Smart Scan processing.

1. The client issues a SELECT statement to return some rows of interest.

84
2. The database kernel determines that Exadata is available and constructs an iDB command
representing the SQL command and sends it to the Exadata cells. iDB is a unique Oracle data
transfer protocol that is used for Exadata storage communications.
3. The Exadata server software scans the data blocks to extract the relevant rows and columns which
satisfy the SQL command.
4. Exadata returns to the database instance an iDB message containing the requested rows and
columns of data. These results are not block images, so they are not stored in the buffer cache.
5. The database kernel consolidates the result sets from across all the Exadata cells. This is similar to
how the results from a parallel query operation are consolidated.
6. The rows are returned to the client.

EXAMPLE TO SHOW THE IMPACT OF SMART SCAN:


Now assume a 4800 gigabyte table is evenly spread across the 14 Exadata cells and a query is executed
which requires a full table scan. As is commonly the case assume that the query returns a small set of
result records.
Without Smart Scan capabilities, each Exadata server behaves like a traditional storage server by
delivering database blocks to the client database.

Because the storage network is bandwidth-limited to 40 gigabits per second, it is not possible for the
Exadata cells to deliver all their power. In this case, each cell cannot deliver more than 0.357 gigabytes
per second to the database and it would take approximately 16 minutes to scan the whole table.

85
Now consider if Smart Scan is enabled for the same query. The same storage network bandwidth limit
applies However this time the entire 4800 GB is not transported across the storage network; only the
matching rows are transported back to the database server. So each Exadata cell can process its part of
the table at full speed; that is, 1.8 GB per second. In this case, the entire table scan would be completed
in approximately three minutes and ten seconds.

86
WHAT OPERATIONS CAN BE OFF-LOADED?
 Predicate filtering:
 Only the rows requested are returned to the database server rather than all the rows in a table.
 Column filtering:
 Only the columns requested are returned to the database server rather than all the columns in a
table.
 Join processing:
 Simple star join processing is performed within Exadata.
 Scans on encrypted data
 Scans on compressed data
 Backups:
 I/O for incremental backups is much more efficient because only changed blocks are returned to
the database server.
 Create/extend tablespace:
 Exadata formats database blocks.

SMART SCAN REQUIREMENTS:


 Query-specific requirements:

 Smart Scan is possible only for full segment scans; that is full table scans, fast full index scans
and fast full bitmap index scans.
 Smart Scan can only be used for direct-path reads
 Direct-path reads are automatically used for parallel queries
 Direct-path reads may be used for serial queries
 Not used by default for serial small table scans
 Use _serial_direct_read=TRUE to force direct-path reads

 Additional general requirements:

 Smart Scan must be enabled within the database. The CELL_OFFLOAD_PROCESSING initialization
parameter controls Smart Scan. The default value of the parameter is TRUE, meaning that Smart
Scan is enabled by default.
 Each segment being scanned must be on a disk group that is completely stored on Exadata cells.
The disk group must also have the following disk group attribute settings:
 'compatible.rdbms' = '11.2.0.0.0' (or later)
 'compatible.asm' = '11.2.0.0.0' (or later)
 'cell.smart_scan_capable' = 'TRUE'

87
SITUATIONS PREVENT SMART SCAN:
Smart Scan cannot be used in following circumstances:

 A scan on a clustered table


 A scan on an index-organized table
 A fast full scan on a compressed index
 A fast full scan on a reverse key indexes
 The table has row-level dependency tracking enabled
 The ORA_ROWSCN pseudo column is being fetched
 The optimizer wants the scan to return rows in ROWID order
 The command is CREATE INDEX using NOSORT
 A LOB or LONG column is being selected or queried
 A SELECT ... VERSIONS flashback query being executed
 More than 255 columns are referenced in the query
 The data is encrypted and cell-based decryption is disabled
 To evaluate a predicated based on a virtual column

More than 255 columns are referenced in the query: This restriction only applies if the query involves
tables that are not compressed using Exadata Hybrid Columnar Compression. Queries on tables
compressed using Exadata Hybrid Columnar Compression can be offloaded even if they reference more
than 255 columns.

The data is encrypted and cell-based decryption is disabled: In order for Exadata Storage Server to
perform decryption, Oracle Database needs to send the decryption key each cell. If there are security
concerns about keys being shipped across the storage network, cell-based decryption can be disabled by
setting the CELL_OFFLOAD_DECRYPTION initialization parameter to FALSE.

MONITORING SMART SCAN IN SQL EXECUTION PLANS:


The CELL_OFFLOAD_PROCESSING initialization parameter controls Smart Scan. The default value of the
parameter is TRUE which means that Smart Scan is enabled. If it is set to FALSE, Smart Scan is disabled
and the database uses Exadata storage to serve data blocks similar to traditional storage. To enable
Smart Scan for a particular SQL statement, use the OPT_PARAM hint as shown in the following example:

SELECT /*+ OPT_PARAM('cell_offload_processing' 'true') */ ...

The CELL_OFFLOAD_PLAN_DISPLAY initialization parameter determines whether the SQL EXPLAIN PLAN
statement displays the predicates that can be evaluated by Exadata Storage Server as STORAGE
predicates for a given SQL statement. The possible values are:

88
 AUTO instructs the SQL EXPLAIN PLAN statement to display the predicates that can be evaluated as
STORAGE only if a cell is present and if a table is on the cell. AUTO is the default setting.
 ALWAYS produces changes to the SQL EXPLAIN PLAN statement whether or not Exadata storage is
present or the table is on the cell. You can use this setting to identify statements that are candidates
for offloading before migrating to Exadata Database Machine.
 NEVER produces any changes to the SQL EXPLAIN PLAN statement due to Exadata. This may be
desirable, for example, if you wrote tools that process execution plan output and these tools have
not been updated to deal with the updated syntax, or when comparing plans from Database
Machine with plans from your previous system.

EXAMPLE OF EXECUTION PLAN WHERE SMART SCAN IS HAPPENING:

The “storage” clause shown under “Predicate Information” section confirms that smart scan is
happening.

89
EXAMPLE OF EXECUTION PLAN WHERE SMART SCAN IS NOT HAPPENING:

 We can also collect a 10046 trace of the query to check if smart scan is happening or not.
 Wait events like “cell smart table scan” or “cell smart index scan” will be seen in the 10046 trace if
smart scan is happening.
 Wait events like “cell singleblock physical read” or “cell multiblock physical read” will be seen in the
10046 trace if smart scan is not happening.

OTHER SITUATIONS AFFECTING SMART SCAN:

 Seeing STORAGE in the execution plan does not guarantee that the query is satisfied using Smart
Scan alone.
 Even when Smart Scan is indicated by the execution plan, other block I/O might also be used.
Example:
 If Exadata Storage Server is not sure that a block is current it transfers that block read to the
buffer cache.
 If chained or migrated rows are detected, and then additional non-Smart Scan block reads may
be required.
 If dynamic sampling is used, then the sampling IO will not use Smart Scan.
 If Exadata Storage Server CPU utilization is significantly greater than CPU utilization on the
database server, then Smart Scan may send additional data to the database server.
 If all the required data already resides in the database buffer cache, the buffer cache copy is
used and no disk I/O is performed.
 Statistics and wait events can be used to confirm what is happening.

90
Exadata Storage Server Wait Events and Smart Scan Statistics:

91
MIGRATED ROWS EXAMPLE:

In this example, the CUSTOMERS table has been updated in a way that resulted in row migration across
approximately 6.5% of the data blocks in the table. Now when the query is executed, the query timing,
the statistics and the wait events are close to the original values observed without any migrated rows.

92
However there are still noticeable differences between the amount of data returned by Smart Scan and
amount of physical interconnect I/O. This difference, along with the cell physical read wait events, are
symptoms of the row migration present in the CUSTOMERS table.

CONCURRENT TRANSACTION EXAMPLE:

The slide shows exactly the same query as before however this time a batch process was updating the
CUSTOMERS table at the same time. The wait events confirm that Smart Scan is still used however this
time a large number of cell single block physical reads are also required. The statistics quantify the
effect.

Notice how the physical I/O over the interconnect rises from approximately 120 MB in the previous
example, to over 4800 MB in this case. Also note the increase in the query elapsed time and how it
correlates with the wait times.

93
COLUMN FILTERING EXAMPLE:

This example examines the effect of column filtering using two simple SQL queries. The top query selects
the entire customers table. The associated query execution plan shows that the table scan is offloaded
to Exadata Storage Server. However because the query asks for the entire table to be returned the
entire table must be transported across the storage network.

The bottom query selects just one column from the customers table. Note that the associated query
execution plan provides no explicit notification regarding column filtering. It does indicate that the
optimizer expects to process a smaller volume of data (29M bytes compared to 290M bytes) which can
be used to infer that column filtering will take place. The proof of column filtering can be seen from the
statistics associated with the query. This time the entire table is eligible for predicate offload and only
the data associated with the cust_email column is transported across the storage network.

94
STORAGE INDEXES
A storage index is a memory-based structure that reduces the amount of physical I/O performed in an
Exadata cell. The storage index keeps track of minimum and maximum column values and this
information is used to avoid useless I/Os.

For example, the slide shows table T1 which contains column B. Column B is tracked in the storage index
so it is known that the first half of T1 contains values for column B ranging between 1 and 5. Likewise it
is also known that the second half of T1 contains values for column B ranging between 3 and 8. Any
query on T1 looking for values of B less than 2 can quickly proceed without any I/O against the second
part of the table.

Given a favorable combination of data distribution and query predicates, a storage index could
drastically speed up a query by quickly skipping much of the I/O. For another query, the storage index
may provide little or no benefit. In any case, the ease of maintaining and querying the memory-based
storage index means that any I/O saved through its use effectively increases the overall I/O bandwidth
of the cell while consuming very few cell resources.

The storage space inside each cell disk is logically divided into 1 MB chunks called storage regions. The
boundaries of ASM allocation units (AUs) are aligned with the boundaries of storage regions. For each of
these storage regions, data distribution statistics are held in a memory structure called a region index.
Each region index contains distribution information for up to eight columns. The storage index is a
collection of the region indexes.

95
The storage statistics maintained in each region index represent the data distribution (minimum and
maximum values) of columns that are considered well clustered. Exadata Storage Server contains logic
that transparently determines which columns are clustered enough to be included in the region index.
Different parts of the same table can potentially have different column sets in their corresponding
region indexes.

The storage index works best when the following conditions are true:

• The data is roughly ordered so that the same column values are clustered together.
• The query has a predicate on a storage index column checking for =, <, > or some combination of
these.

It is important to note that the storage index works transparently with no user input. There is no need to
create, drop, or tune the storage index. The only way to influence the storage index is to load your
tables using presorted data.

Also, because the storage index is kept in memory, it disappears when the cell is rebooted. The first
queries that run after a cell is rebooted automatically cause the storage index to be rebuilt.

The storage index works for data types whose binary encoding is such that byte-wise binary lexical
comparison of two values of that data type is sufficient to determine the ordering of those two values.
This includes data types like NUMBER, DATE, and VARCHAR2. However, NLS data types are an example
of data types that are not included for storage index filtering.

PARTITION SIZE
Storage Indexes depend on Smart Scans, which depend on direct path reads. Oracle will generally use
serial direct path reads for large objects. However, when an object is partitioned, Oracle may fail to
recognize that the object is “large,” because Oracle looks at the size of each individual segment. This
may result in some partitions not being read via the Smart Scan mechanism and thus disabling any
Storage Indexes for that partition. When historical partitions are compressed, the problem becomes
even more noticeable, as the reduced size of the compressed partitions will be even less likely to trigger
the serial direct path reads. This issue can be worked around by not relying on the serial direct path read
algorithm and instead specifying a degree of parallelism for the object or using a hint to force the
desired behavior.

96
CONTROLLING STORAGE INDEXES:
There is not much you can do to control Storage Index behavior. However, the developers have built in a
few hidden parameters that provide some flexibility.

There are three database parameters that deal with Storage Indexes (that we’re aware of):

• _kcfis_storageidx_disabled (default is FALSE)


• _kcfis_storageidx_diag_mode (default is 0)
• _cell_storidx_mode (default is EVA)

None of these parameters are documented, so need to consult Oracle support before making use of any
of these.

Setting _kcfis_storageidx_disabled to TRUE will disable storage indexes for reads. The existing storage
indexes will still be updated when values in a table are changed, even if this parameter is set to TRUE.

SYS@EXDB1> alter session set "_kcfis_storageidx_disabled"=true;

SYS@EXDB1> select count(*) from kso.skew3 where pk_col = 7000;

COUNT(*)

12

Elapsed: 00:00:13.74

SYS@EXDB1> alter session set "_kcfis_storageidx_disabled"=false;

SYS@EXDB1> select count(*) from kso.skew3 where pk_col = 7000;

COUNT(*)

12

Elapsed: 00:00:01.06

With storage indexes diasbled,a counter for only 12 rows had to be returned to the database machine,
but the storage cells still had to read all the data to determine which rows to return.When Storage
Indexes were re-enabled, by setting _KCFIS_STORAGEIDX_DISABLED to FALSE, and the query with the
WHERE clause was executed again. This time the elapsed time was only about 1 second. While this
performance improvement seems extreme, it is relatively common when Storage Indexes are used.

97
MONITORING STORAGE INDEXES:
There is only one database statistic related to storage indexes.

The statistic, “cell physical IO bytes saved by storage index”, keeps track of the accumulated I/O that
has been avoided by the use of Storage Indexes. This statistic is exposed in v$sesstat and v$sysstat and
related views. Since the statistic is cumulative, like all statistics in v$sesstat, it must be checked before
and after a given SQL statement in order to determine whether Storage Indexes were used on that
particular statement. Here is an example:

SYS@EXDB1> select name, value from v$mystat s, v$statname n where n.statistic# = s.statistic# and
name like '%storage%';
NAME VALUE
--------------------------------------------- ---------------
cell physical IO bytes saved by storage index 0

SYS@EXDB1> select avg(pk_col) from kso.skew2 where col1 is null;


AVG(PK_COL)
-----------
32000001

SYS@EXDB1> select name, value from v$mystat s, v$statname n where n.statistic# = s.statistic# and
name like '%storage%';
NAME VALUE
--------------------------------------------- ---------------
cell physical IO bytes saved by storage index 3984949248

SYS@EXDB1> select avg(pk_col) from kso.skew2 where col1 is null;


AVG(PK_COL)
-----------
32000001

SYS@EXDB1> select name, value from v$mystat s, v$statname n where n.statistic# = s.statistic# and
name like '%storage%';
NAME VALUE
--------------------------------------------- ---------------
cell physical IO bytes saved by storage index 7969898496

The value for this statistic will be 0 until a SQL statement that uses a Storage Index has been executed in
the current session. In our example, the query used a Storage Index that eliminated about 4 billion bytes
of disk I/O. This is the amount of additional I/O that would have been necessary without Storage
Indexes. Note that v$mystat is a view that exposes cumulative statistics for your current session. So if
you run the statement a second time, the value should increase to twice the value it had after the first
execution. Of course, disconnecting from the session (by exiting SQL*Plus for example) resets most
statistics exposed by v$mystat, including this one, to 0.

98
EXADATA HYBRID COLUMNAR COMPRESSION
Using Exadata Hybrid Columnar Compression data is organized into sets of rows called compression
units. Within a compression unit, data is organized by column and then compressed. The column
organization of data brings similar values close together, enhancing compression ratios. Each row is self-
contained within a compression unit.

The size of a compression unit is determined automatically by Oracle Database based on various factors
in order to deliver the most effective compression result while maintaining excellent query
performance. Although the diagram in the slide shows a compression unit with four data blocks, do not
assume that a compression unit always contains four blocks.

In addition to providing excellent compression, Exadata Hybrid Columnar Compression works in


conjunction with Smart Scan so that column projection and row filtering can be executed along with
decompression at the storage level to save CPU cycles on the database servers.

BASIC vs OLTP vs HCC:

BASIC COMPRESSION

This compression method is a base feature of Oracle Database 11g Enterprise Edition. It compresses
data only on direct path loads. Modifications force the data to be stored in an uncompressed format, as
do inserts that do not use the direct path load mechanism. Rows are still stored together in the normal
row-major form. The compression unit is a single Oracle block. BASIC is the default compression
method, from a syntax standpoint. For example, BASIC compression will be used if you issue the
following command:

CREATE TABLE … COMPRESS;

Basic compression was introduced in Oracle Database version 9i. This form of compression was also
referred to as DSS Compression in the past.

99
OLTP COMPRESSION

The OLTP compression method allows data to be compressed for all operations, not just direct path
loads. It is part of an extra-cost option called Advanced Compression and was introduced in Oracle
Database version 11g Release 1. The storage format is essentially the same as BASIC, using a symbol
table to replace repeating values. OLTP compression attempts to allow for future updates by leaving 10
percent free space in each block via the PCTFREE setting (BASIC compression uses a PCTFREE value of 0
percent). Therefore, tables compressed for OLTP will occupy slightly more space than if they were
compressed with BASIC (assuming direct path loads only and no updates). The syntax for enabling this
type of compression is

CREATE TABLE … COMPRESS FOR OLTP;

OLTP compression is important because it is the fallback method for tables that use HCC compression. In
other words, blocks will be stored using OLTP compression in cases where HCC cannot be used (non-
direct path loads for example). One important characteristic of OLTP compression is that updates and
non-direct path inserts are not compressed initially. Once a block becomes “full” it will be compressed.

HYBRID COLUMNAR COMPRESSION

HCC is only available for tables stored on Exadata storage. As with BASIC compression, data will only be
compressed in HCC format when it is loaded using direct path loads. Conventional inserts and updates
cause records to be stored in OLTP compressed format. In the case of updates, rows are migrated to
new blocks. These blocks are marked for OLTP compression, so when one of these new blocks is
sufficiently full; it will be compressed using the OLTP algorithm.

HCC provides four levels of compression, as shown in below Table. Note that the expected compression
ratios are very rough estimates and that the actual compression ratio will vary greatly depending on the
data that is being compressed.

Tables may be compressed with HCC using the following syntax:


CREATE TABLE ... COMPRESS FOR QUERY LOW;
CREATE TABLE ... COMPRESS FOR QUERY HIGH;
CREATE TABLE ... COMPRESS FOR ARCHIVE LOW;
CREATE TABLE ... COMPRESS FOR ARCHIVE HIGH;

You may also change a table’s compression attribute by using the ALTER TABLE statement. However, this
command has no effect on existing records unless you actually rebuild the segment using the MOVE
keyword. Without the MOVE keyword, the ALTER TABLE command merely notifies Oracle that future
direct path inserts should be stored using HCC.

100
RESTRICTIONS/CHALLENGES:

1 Moving Data to a non-Exadata Platform:

Probably the largest hurdle with using HCC has been moving the data to non-Exadata platforms. For
example, while RMAN and Data guard both support the HCC block format, and will happily restore data
to a non-Exadata environment, a database running on such an environment will not be able to do
anything with the data until it is decompressed. This can mean a lengthy delay before being able to
access the data in a case where a failover to a standby on a non-Exadata platform occurs. The same
issue holds true for doing an RMAN restore to a non-Exadata platform. The restore will work but the
data in HCC formatted blocks will not be accessible until the data has been moved into a non-HCC
format. This can be done with the ALTER TABLE MOVE NOCOMPRESS command, by the way.

In addition to the lengthy delay associated with decompressing data before being able to access it, there
is also the issue of space. If HCC is providing a 10× compression factor, you will need to have 10 times
the space you are currently using available on the target environment to handle the increased size of the
data. For these reasons, Dataguard is rarely set up with a standby on a non-Exadata platform.

101
2 Locking Issues

The Exadata documentation says that updating a single row of a table compressed with HCC locks the
entire compression unit containing the row. This can cause extreme contention issues for OLTP-type
systems. This is the main reason that HCC is not recommended for tables (or partitions) where the data
will be updated. Here’s a demonstration of the locking behavior:

KSO@SANDBOX1> update kso.skew_hcc3 set col1=col1 where pk_col = 27999409;

1 row updated.

SYS@SANDBOX1> select col1 from kso.skew_hcc3 where pk_col = 27999409 for update nowait;
select col1 from kso.skew where pk_col = 16858437 for update nowait
*
ERROR at line 1:
ORA-00054: resource busy and acquire with NOWAIT specified or timeout expired

-- Expected because this row has been updated by another process

SYS@SANDBOX1> select col1 from kso.skew_hcc3 where pk_col = 27999401 for update nowait;
select col1 from kso.skew where pk_col = 16858429 for update nowait
*
ERROR at line 1:
ORA-00054: resource busy and acquire with NOWAIT specified or timeout expired

-- Not normal Oracle locking behavior

Clearly this behavior would be disastrous to many OLTP systems.

3 Single Row Access :

HCC is built for full table scan access. Decompression is a CPU-intensive task. Smart Scans can
distribute the decompression work to the CPU’s on the storage cells. This makes the CPU-intensive task
much more palatable. However, Smart Scans only occur when Full Scans are performed. This means that
other access mechanisms, index access for example, must use the DB server CPUs to perform
decompression. This can put an enormous CPU load on DB servers in high volume OLTP-type systems.

In addition, since data for a single row is spread across multiple blocks in a CU, retrieving a complete row
causes the entire CU to be read. This can have a detrimental effect on the overall database efficiency for
systems that tend to access data using indexes, even if the access is read-only.

102
PARALLELISM
Exadata doesn’t have a special way of executing parallel operations that is not available on other
platforms running 11gR2. However, parallel processing is a key component of Exadata because efficient
handling of Data Warehouse workloads was a primary design goal for Exadata. In addition, because
Offloading/Smart Scan depends on direct path reads, which are used by parallel query slaves, parallel
operations take on a whole new importance. Traditionally, the use of parallel query has required careful
control of concurrency in order to maximize the use of available resources without overwhelming the
system.

PARALLELIZATION AT THE STORAGE TIER

Exadata has a lot of processing power at the storage layer. Regardless of whether you are using a v2
Quarter rack or an x2-8 full rack, you still have more cpu resources available at the storage layer than
you have at the database layer. Since smart scans offload a lot of processing to the storage cells, every
query executed via smart scan is effectively parallelized across the cpus on the storage cells. This type of
parallelization is completely independent of the database parallel processing capabilities, by the way.

So this kind of parallelization occurs even when the activity is driven by a single process on a single
database server. This introduces some interesting issues that should be considered with regard to
normal parallelization at the database tier. Since one of the primary jobs of a parallelized query is to
allow multiple processes to participate in the i/o operations, and since the i/o operations are already
spread across multiple processes, the degree of parallelism required by statements running on the
exadata platform should be smaller than on other platforms.

We look at the below 3 main new features related to parallelism introduced with 11gR2 and see which
will be beneficial to exadata :-

1 AUTO DOP
It was designed to overcome the problems associated with the fact that there is rarely a single DOP
value that is appropriate for all queries touching a particular object. Prior to 11gR2, DOP could be
specified at the statement level via hints or at the object level via the DEGREE and INSTANCE settings.
Realistically, using hints at the statement level makes more sense in

Most situations for the reason just mentioned. But it requires that the developers understand the
platform that the statements will be running on and the workload that the hardware will be supporting
at the time of execution. Getting the settings correct can be a tedious trial-and-error process and
unfortunately, DOP cannot be changed while a statement is running. Once it starts, your only options
are to let it complete or kill it and try again. This makes fine tuning in a “live” environment a painful
process.

Operation and configuration:

103
When Auto DOP is enabled, Oracle evaluates each statement to determine whether it should be run in
parallel and if so, what DOP should be used. Basically any statement that the optimizer concludes will
take longer than 10 seconds to run serially will be a candidate to run in parallel.

The 10-second threshold can be controlled by setting the PARALLEL_MIN_TIME_THRESHOLD parameter


by the way. This decision is made regardless of whether any of the objects involved in the statement
have been decorated with a parallel degree setting or not.

Auto DOP is enabled by setting the PARALLEL_DEGREE_POLICY parameter to a value of AUTO or


LIMITED.

The default setting for this parameter is MANUAL, which disables all three of the new 11gR2 parallel
features (Auto DOP, Parallel Statement Queuing, In-memory Parallel Execution). Unfortunately,
PARALLEL_DEGREE_POLICY is one of those parameters that control more than one thing. The following
list shows the effects of the various settings for this parameter.

MANUAL: If PARALLEL_DEGREE_POLICY is set to MANUAL, none of the new 11gR2 parallel features will
be enabled. Parallel processing will work as it did in previous versions. That is to say, statements will
only be parallelized if a hint is used or an object is decorated with a parallel setting.

LIMITED: If PARALLEL_DEGREE_POLICY is set to LIMITED, only Auto DOP is enabled while Parallel
Statement Queuing and In-memory Parallel Execution remain disabled. In addition, only statements
accessing objects that have been decorated with the default parallel setting will be considered for Auto
DOP calculation.

AUTO: If PARALLEL_DEGREE_POLICY is set to AUTO, all three of the new features are enabled.
Statements will be evaluated for parallel execution regardless of any parallel decoration at the object
level.

2 PARALLEL STATEMENT QUEUING

This mechanism separates long running parallel queries from the rest of the workload. The mechanics
are pretty simple. Turn the feature on. Set a target number of parallel slaves using the
PARALLEL_SERVERS_TARGET parameter. Run stuff. If a statement that requires exceeding the target
tries to start, it will be queued until the required number of slaves becomes available. There are of
course many details to consider and other control mechanisms that can be applied to manage the
process.

The main parameter is PARALLEL_SERVERS_TARGET, which tells Oracle how many parallel server
processes to allow before starting holding statements back in the queue. The default value for this
parameter is calculated as follows:

104
((4 × CPU_count) × parallel_threads_per_cpu) × active_instances

So on an Exadata X2-2 with a database that spans 4 RAC nodes, the default value would be calculated as:
((4 × 12)× 2) × 4 = 384

Oracle’s Database Resource Manager (DBRM) provides additional capability to control Parallel
Statement Queuing. Without DBRM, the parallel statement queue behaves strictly as a first-in, first-out
(FIFO) queue. DBRM provides several directive attributes that can be used to provide additional control
on a consumer group basis. Many of these controls were introduced in version 11.2.0.2

3 IN-MEMORY PARALLEL EXECUTION :

The In-Memory Parallel Execution feature takes a different approach. It attempts to make use of the
buffer cache for parallel queries. The feature is cluster-aware and is designed to spread the data across
the cluster nodes (that is, the RAC database servers). The data blocks are also affinitized to a single
node, reducing the amount of communication and data transfers between the nodes. The goal, of
course, is to speed up the parallel query by eliminating disk I/O. This can be a viable technique because
many systems now have very large amounts of memory, which of course can provide a significant speed
advantage over disk operations. There are some downsides to this approach though. The biggest
disadvantage with regard to Exadata is that all the Smart Scan optimizations are disabled by this feature.

SUMMARY:

Parallel execution of statements is important for maximizing throughput on the Exadata platform.
Oracle database 11g Release 2 includes several new features that make the parallel execution a more
controllable feature. This is especially important when using the platform with mixed workloads. The
new Auto DOP feature is designed to allow intelligent decisions about DOP to be made automatically
based on individual statements. In-memory Parallel Execution may not be as useful on Exadata
platforms as it is on non-Exadata platforms, because it disables the optimizations that come along with
Smart Scans.

Of the three new features, Parallel Statement Queuing is the most useful as it allows a mixture of
throughput-oriented work to co-exist with response-time–sensitive work. Integration with Oracle
Resource Manager further enhances the feature by providing a great deal of additional control over the
queuing.

105
IORM
Exadata storage server i/o resource management (IORM) allows workloads and databases to share i/o
resources automatically according to user-defined policies. To manage workloads within a database, you
can define intra-database resource plans using the database resource manager (dbrm), which has been
enhanced to work in conjunction with Exadata storage Server. To manage workloads across multiple
databases, you can define IORM plans.

For example, if a production database and a test database are sharing an exadata cell, you can configure
resource plans that give priority to the production database. In this case,whenever the test database
load would affect the production database performance, iorm will schedule the i/o requests such that
the production database i/o performance is not impacted. This means that the test database i/o
requests are queued until they can be issued without disturbing the production database i/o
performance.

I/O RESOURCE MANAGEMENT CONCEPTS:

Exadata Storage Server IORM extends the consumer group concept using categories. While consumer
groups represent collections of users within a database, categories represent collections of consumer
groups across all databases. The diagram in the slide shows an example of two categories containing
consumer groups across two databases. You can manage I/O resources based on categories by creating
a category plan.

For example, you can specify precedence to consumer groups in the Interactive category over consumer
groups in the Batch category for all the databases sharing an Exadata cell.

106
IORM Architecture

The diagram in the slide illustrates the high-level implementation of IORM. For each disk-based cell disk,
each database accessing the cell has one I/O queue per consumer group and three background I/O
queues. The background I/O queues correspond to high, medium, and low priority requests with
different I/O types mapped to each queue. If you do not set an intra-database resource plan, all non-
background I/O requests are grouped into a single consumer group called OTHER_GROUPS.

IORM only manages I/O queues for physical disks. IORM does not arbitrate requests to flash-based grid
disks or requests serviced by Exadata Smart Flash Cache.

Getting Started with IORM


Initially, the IORM plan for every cell is configured as follows:

 The IORM plan has a name of form <cell name>_IORMPLAN.


 The catPlan and dbPlan attributes are empty. These attributes define the interdatabase I/O resource
management plan and the category I/O resource management plan. Details regarding these plans
are provided later in this lesson.
 The IORM plan objective is set to off which stops IORM from managing I/O resources.
 The IORM plan status is set to active.
Before I/O resources can be managed by IORM, you need to ensure that the IORM plan is active and
that the IORM plan objective is set to a value other than off.

107
If the IORM plan status is set to inactive, it can be activated using the command:
CellCLI> alter iormplan active

I/O RESOURCE MANAGEMENT PLANS EXAMPLE

The category, interdatabase, and intradatabase plans are used together by Exadata to allocate I/O
resources.

The category plan is first used to allocate resources among the categories. When a category is selected,
the inter-database plan is used to select a database; only databases that have consumer groups with the
selected category can be selected. Finally, the selected database’s intra-database plan is used to select
one of its consumer groups. The percentage of resource allocation represents the probability of making
a selection at each level.

Expressing this as a formula:


Pcgn = cgn / sum(catcgs) * db% * cat%
where:

 Pcgn is the probability of selecting consumer group n


 cgn is the resource allocation for consumer group n

108
 sum(catcgs) is the sum of the resource allocations for all consumer groups in the same category as
consumer group n and on the same database as consumer group n
 db% is the database allocation percentage in the interdatabase plan
 cat% is the category allocation percentage in the category plan
The hierarchy used to distribute I/Os is illustrated in the slide. The example is continued from the
previous slide but the consumer group names are abbreviated to CG1, CG2, and so on. Notice that
although each consumer group allocation is expressed as a percentage within each database, IORM is
concerned with the ratio of consumer group allocations within each category and database.

For example, CG1 nominally receives 16.8% of I/O resources from IORM (15/(15+10)*70%*40%);
however, this does not change if the intradatabase plan allocations for CG1 and CG2 are doubled to 30%
and 20%, respectively. This is because the allocation to CG1 remains 50% greater than the allocation to
CG2. This behavior also explains why CG1 (16.8%) and CG3 (19.6%) have a similar allocation through
IORM even though CG3 belongs to the higher priority category (60% versus 40%) and has a much larger
intra-database plan allocation (35% versus 15%).

Note: ASM I/Os (for rebalance and so on) and I/Os issued by Oracle background processes are handled
separately and automatically by Exadata. For clarity, background I/Os are not shown in the example.

ENABLING INTRADATABASE RESOURCE MANAGEMENT

109
INTERDATABASE CATEGRY PLAN

110
IORM WITH ORACLE DATABASE 12C

To manage PDBs within a CDB, you can configure a new CDB resource plan in the CDB root container. A
CDB resource plan interacts with IORM in essentially the same way as an intrada-trabase resource plan,
except that a non-CDB resource plan manages resources amongst consumer groups and a CDB resource
plan manages resources amongst PDBs.
Like intradatabase resource plans, the CDB resource plan manages database server CPU and Exadata
Storage Server I/O. Also, to enforce a CDB resource plan, you must set the cell IORM objective to a
setting other than basic. The recommended initial setting is auto.

IORM AND EXADATA STORAGE SERVER FLASH MEMORY


IORM only manages I/O queues for physical disks. IORM does not arbitrate requests to flash-based grid
disks or requests serviced by Exadata Smart Flash Cache.

However, commencing with Exadata Storage Server software release 11.2.2.3, IORM can be used to
specify if a database is allowed to use Exadata Smart Flash Cache. This allows flash cache to be reserved
for the most important databases, which is especially useful in environments that are used to
consolidate multiple databases.

Set flashCache=on in the interdatabase plan directive to allow the associated databases to use Exadata
Smart Flash Cache, and set flashCache=off to prevent databases from using Exadata Smart Flash Cache.

If an interdatabase plan directive does not contain the flashCache attribute, then flashCache=on is
assumed.

111
Similarly, the flashLog attribute can be set to specify whether or not databases can use Exadata Smart
Flash Log. If the flashLog attribute is not specified, then flashLog=on is assumed. Exadata Smart Flash log
requires Exadata Storage Server software release 11.2.2.4.0 or later.

CellCLI> alter iormplan


> dbPlan=((name=oltp, level=1, allocation=80, flashCache=on, flashLog=on
> (name=dss, level=1, allocation=20, limit=50, flashCache=off, flashLog=on), -
> (name=other, level=2, allocation=100, flashCache=off, flashLog=off))

DBFS
BULK DATA LOADING OVERVIEW
The recommended approach for bulk data loading into Database Machine relies on the external table
feature of Oracle Database. The data files used for bulk data loading can be in any format supported by
external tables. The process for creating the data files is outside the scope of this lesson and mostly
depends on the facilities available in the source system.

However the data files are created, the following should be taken into account in order to facilitate high-
performance parallel access to the data files while they are being loaded:

 When accessing large data files through external tables, where possible Oracle automatically divides
the files into 10 MB granules. These granules can be processed in separate parallel processing
threads. Oracle is unable to use this approach with compressed files or data read from a pipe or a
tape device.

112
 If granules cannot be used then each separate data file can be treated as a granule and the number
of files determines the maximum degree of parallelism that is available. You can manually divide a
large file into separate smaller files and use them to manually enable parallelism.
 If you are using multiple input data files in conjunction with a single external table, then you should
try to keep the data files similar in size. If the file sizes do vary significantly, then list them in order
from largest to smallest in the external table definition.

STAGING THE DATA FILES


It is recommended that you stage your data files inside Database Machine using Database File System.
DBFS is an Oracle Database feature that enables the database to be used as a high-performance POSIX-
compatible file system on Linux. Using the available space on internal database server disk drives for
staging data files is highly discouraged.

Inside DBFS files are stored as Secure-Files LOBs. A set of PL/SQL procedures implement the file system
access primitives, such as open, close, create, and so on. The dbfs_client utility enables the mounting of
a DBFS file system as a mount point on Linux. It also provides the mapping from file system operations
to database operations. The dbfs_client utility runs completely in user space and interact with the kernel
through the FUSE library infrastructure.
Note: ASM Cluster File System (ACFS) is not supported in conjunction with Exadata.

CONFIGURING THE STAGING AREA

While DBFS is fully functional if it is co-located with your target database, it is recommended to
configure DBFS in a separate staging database.
 Use DBCA to create a database based on the OLTP template
– Redo logs at least 8 GB
– 4 GB buffer cache
– 1 GB shared pool
– 8 KB or 16 KB block size
 Create a bigfile tablespace for DBFS storage
SQL> CREATE BIGFILE TABLESPACE DBFS DATAFILE '+DBFS_DG' SIZE 32G AUTOEXTEND ON NEXT
8G MAXSIZE 300G NOLOGGING ONLINE PERMANENT EXTENT MANAGEMENT LOCAL
AUTOALLOCATE SEGMENT SPACE MANAGEMENT AUTO;
 Create a DBFS user account
SQL> create user dbfs identified by dbfs quota unlimited on DBFS;
SQL> grant create session, create table, create procedure, dbfs_role to dbfs;

 Additional database server operating system configuration


– Add the Oracle software owner, or user that will mount the DBFS file system, to the fuse
group
# usermod –a –G fuse oracle

113
– As root, create /etc/fuse.conf containing the entry: user_allow_other

# echo "user_allow_other" > /etc/fuse.conf


# chmod 644 /etc/fuse.conf

– Create a mount point for DBFS with ownership and group permissions set to the Oracle
software owner, or user that will mount the DBFS file system

# mkdir /data
# chown oracle:dba /data

After the staging database is created and the required operating system configuration is completed you
can create the DBFS store Use the script located at:
$ORACLE_HOME/rdbms/admin/dbfs_create_filesystem_advanced.sql. The script must be run by the
DBFS database user (created earlier in the configuration process). The script accepts numerous
parameters.

In the example on the slide, <TS Name> represents the name of the tablespace you created to house the
DBFS store, and <FS Name> represents the name of the DBFS store, such as mydbfs for example. This
name is used later after DBFS is mounted to name the directory that appears under the DBFS mount.

$ cd $ORACLE_HOME/rdbms/admin
$ sqlplus dbfs/dbfs
SQL> @dbfs_create_filesystem_advanced.sql DBFS mydbfs nocompress nodeduplicate noencrypt non-
partition

$ nohup $ORACLE_HOME/bin/dbfs_client dbfs@<StagingDB> -o allow_other,direct_io /data <


passwd.txt &

 Using DBFS
– Access DBFS through <DBFS Mount Point>/<FS Name>
– For example: /data/mydbfs
– Copy data files to DBFS using network file transfer methods such as FTP and NFS

LOADING THE TARGET DATABASE:


 Copy the CSV file into your DBFS staging area
 Create a directory object that points to your DBFS staging director.
 Create an external table which references the data in your DBFS-staged CSV data file.
 Use a CREATE TABLE AS SELECT command to load the external table data contained in
the CSV file into a new table in your database.

Example for external table creation using a csv file which is staged in DBFS:

114
115
Managing DBFS mounting via Oracle Clusterware
Download the script mount-dbfs.sh available in Doc Id : 1054431.1.

1. Edit or confirm the settings for the following variables in the script. Comments in the script will help
you to confirm the values for these variables.

– DBNAME
– MOUNT_POINT
– DBFS_USER
– ORACLE_HOME (should be the RDBMS ORACLE_HOME directory)
– LOGGER_FACILITY (used by syslog to log the messages/output from this script)
– MOUNT_OPTIONS
– DBFS_PASSWD (used only if WALLET=false)
– DBFS_PWDFILE_BASE (used only if WALET=false)
– WALLET (must be true or false)
– TNS_ADMIN (used only if WALLET=true)
– DBFS_LOCAL_TNSALIAS

This script will internally invoke the dbfs_client command to mount the DBFS :

(nohup $DBFS_CLIENT ${DBFS_USER}@ -o $MOUNT_OPTIONS \ $MOUNT_POINT < $DBFS_PWDFILE |


$LOGGER -p ${LOGGER_FACILITY}.info 2>&1 & ) &

After editing, copy the script (rename it if desired or needed) to the proper directory
(GI_HOME/crs/script) on all database nodes.

2. Add the CRS resource using below command :

crsctl add resource $RESNAME -type local_resource \


-attr "ACTION_SCRIPT=/u01/app/11.2.0/grid/crs/script/mount-dbfs.sh, \
CHECK_INTERVAL=30, RESTART_ATTEMPTS=10, \
START_DEPENDENCIES='hard(ora.$DBNAMEL.db)pullup(ora.$DBNAMEL.db)',\
STOP_DEPENDENCIES='hard(ora.$DBNAMEL.db)',\
SCRIPT_TIMEOUT=300"

Now the “crsctl start resource” and “crsctl stop resource” commands can be used mount/unmounts the
DBFS filesystem.

116
CREATING AND MOUNTING MULTIPLE DBFS FILESYSTEMS
1. Create additional filesystems under same DBFS repository owner (database user)
– The additional filesystems will show as sub-directories which are the filesystem names given
during creation of the filesystem (second argument to the dbfs_create_filesystem_advanced
script).
– There is only one mount point for all filesystems created in this way.
– Only one mount-dbfs.sh script needs to be configured.
– All filesystems owned by the same DBFS repository owner will share the same mount options
(i.e. direct_io).
2. Create another DBFS repository owner (database user) and new filesystems under that owner.
– Can be in the same database with other DBFS repository owners or in a completely separate
database.
– Completely separate: can use different tablespaces (which could be in different diskgroups),
separate mount points, possibly different mount options (direct_io versus non-direct_io).
– One DBFS filesystem has no impact on others in terms of administration or dbfs_client
start/stop.
– Requires a new mount point to be created and used.
– Requires a second mount-dbfs.sh to be created and configured in Clusterware.
– Also supports having completely separate ORACLE_HOMEs with possibly different software
owner (Linux/Solaris) accounts managing the repository databases.

To configure option #1 above, follow these steps:

a. It is recommended (but optional) to create a new tablespace for the new DBFS filesystem you
are creating.
b. Connect to the DBFS repository as the current owner (dbfs_user is the example owner used in
this note) and then run the dbfs_filesystem_create_advanced script again using a different
filesystem name (the 2nd argument).
c. The filesystem will appear as another subdirectory just below the chosen mount point.

To configure option #2 above, it is same as creating a new user, tablespace and following the same
process to create a new DBFS.

REMOVING DBFS CONFIGURATION


1 Stop the dbfs_mount service in clusterware using the oracle account.

(oracle)$ <GI_HOME>/bin/crsctl stop resource <RESNAME>

2. Confirm that the resource is stopped and then remove the clusterware resource for dbfs_mount as
the oracle (or Grid Infrastructure owner) user.
(oracle)$ <GI_HOME>/bin/crsctl stat resource <RESNAME> –t
(oracle)$ <GI_HOME>/bin/crsctl delete resource <RESNAME>
3. Remove the custom action script that supported the resource and the /etc/fuse.conf file as the root
user.
(root)# dcli -g dbs_group -l root rm -f /u01/app/11.2.0/grid/crs/script/mount-dbfs.sh /etc/fuse.conf

117
4. Remove the mount point directory as the root user.
(root)# dcli -g dbs_group -l root rmdir /data

5. Modify the group memberships for the oracle user account


(root)# dcli -g dbs_group -l root usermod –G oinstall, dba, oper, asmdba oracle

6 The DBFS repository objects remain. You may either:


 Delete the DBFS repository database using DBCA once the steps above are completed.
 Remove the DBFS repository by connecting to the database as the repository owner using

SQL*Plus and running @?/rdbms/admin/dbfs_drop_filesystem <filesystem-name>


SQL> connect dbfs_user/dbfs_passwd
SQL> @?/rdbms/admin/dbfs_drop_filesystem <FS NAME>
SQL> connect / as sysdba
SQL> drop user dbfs_user cascasde;

INSTANCE CAGING
Instance caging is an important tool for managing and limiting CPU utilization per database. Instance
caging is used to prevent runaway processes and heavy application loads from generating a very high
system load, resulting in an unstable system and poor performance for other databases.

To enable instance caging, do the following for each instance on the server:

1 Enable the Oracle Database Resource Manager by assigning a resource plan to the initialization
parameter “RESOURCE_MANAGER_PLAN”. The resource plan must have CPU directives to enable
instance caging. See Oracle Database Administration’s Guide 11g Release 2 "Enabling Oracle
Database Resource Manager and Switching Plans" for instructions. If you are not planning on
managing workloads within a database, you can simply set RESOURCE_MANAGER_PLAN to
“DEFAULT_PLAN”.

SQL> ALTER SYSTEM SET RESOURCE_MANAGER_PLAN = 'DEFAULT_PLAN' SID=’*’ SCOPE=’BOTH’;

118
2 Set the CPU_COUNT initialization parameter to the maximum number of CPUs the instance should
use at any time. By default, CPU_COUNT is set to the total number of CPUs on the server. For hyper-
threaded CPUs, CPU_COUNT includes CPU threads. CPU_COUNT is a dynamic parameter, so its value
can be altered at any time but it is best set at instance startup because the CPU_COUNT parameter
influences other Oracle parameters and internal structures (for example, PARALLEL_MAX_SERVERS,
buffer cache, and latch structure allocations).

Following are the recommended guidelines for setting the CPU_COUNT parameter:

 For critical pods the total of the CPU_COUNT parameter values across all the databases should be
less that 75% of the total number of CPU cores on the server. This allocation should ensure that
databases are not starved of CPU resources. The remaining 25% of CPU resources are reserved for
other processes such as ASM and clusterware.

PARTITIONED APPROACH

If the sum of the CPU_COUNT values of all the databases instances on the target database server does
not exceed the number of CPUs on the server, then the server is partitioned. In this case, there should
be no contention for CPU resources between database instances. However, if one database instance
does not use its allocated CPU resources, then these CPU resources cannot be utilized by the other
database instances.

The advantage of the partitioned approach is that there is no CPU contention, but CPUs may be
underutilized. The partitioned approach is therefore recommended for mission-critical databases in
critical Hardware Pools.

The specific sizing recommendation is as follows: by limiting the sum of CPU_COUNT values to less than
100% of the total number of CPUs on the server, each instance's CPU allocation should be derived by
your sizing analysis and taking account maximum historical CPU usage.

sum(CPU_COUNT) <= 100% * Total CPUs

119
OVER-SUBSCRIBED APPROACH

If the sum of the CPU_COUNT values of all the database instances on the target database server exceeds
the number of CPUs on the server, then the server is over-subscribed. In this case, if all databases are
heavily loaded at the same time, then there will be some contention for CPU resources and degraded
performance.

The advantage of the over-subscribed approach is better resource utilization, but with potential CPU
contention. The over-subscribed approach is therefore recommended for test or non-critical database
Hardware Pools whose peak periods do not coincide.

It is recommended to limit the sum of CPU_COUNT values to three times the number of CPUs.

sum (CPU_COUNT) <= up to 3 * Total CPUs

Once instance caging has been configured, you can use the scripts in MOS note 1362445.1 to monitor
the actual CPU usage of each instance. With this information, you can tune instance caging by adjusting
the CPU_COUNT value for each instance as needed. For very active Oracle RAC systems, you should
allocate at least 2 CPUs to each database instance so the background processes (for example, SMON,
PMON, LMS, and so on) continue to function efficiently. Also refer to MOS note 1340172.1 for the
recommended patches for instance caging.

HUGE PAGES
HugePages is a mechanism that allows the Linux kernel to utilize the multiple page size capabilities.
Linux uses pages as the basic unit of memory, where physical memory is partitioned and accessed using
the basic page unit. The default page size is 4096 bytes. Hugepages allows large amounts of memory to
be utilized with a reduced overhead. Linux uses Transaction Lookaside Buffers (TLB) in the CPU
architecture. These buffers contain mappings of virtual memory to actual physical memory addresses.
So a system with a large page size provides higher density of TLB entries, thus reducing pagetable space.

120
Furthermore, each process uses a smaller private pagetable and those memory savings can grow
significantly as the process count grows.

HugePages is generally required if PageTables in /proc/meminfo is more than 2% of physical memory.


When set, HugePages should equal the sum of the shared memory segments used by all the database
instances. When all the database instances are running, the amount of shared memory being used can
be calculated by analyzing the output from the ipcs -m command. MOS note 401749.1 provides a script
which can be used to determine the amount of shared memory in use.

MOS note 361468.1 describes how to set HugePages. You have to basically Edit the file /etc/sysctl.conf
and set the vm.nr_hugepages parameter there and reboot the server:
[root@exa01dbadm01 ~]# cat /etc/sysctl.conf | grep -i vm.nr_hugepages
vm.nr_hugepages=82000

After the system is rebooted, Check the HugePages state from /proc/meminfo. e.g.:

[root@exa01dbadm01 ~]# cat /proc/meminfo | grep -i Hugepages


HugePages_Total: 82000
HugePages_Free: 6845
HugePages_Rsvd: 2688
HugePages_Surp: 0
Hugepagesize: 2048 kB

The values in the output will vary. To make sure that the configuration is valid, The HugePages_Free
value should be smaller than HugePages_Total and there should beHugePages_Rsvd. HugePages_Rsvd
counts free pages that are reserved for use (requested for an SGA, but not touched/mapped yet).

The sum of Hugepages_Free and HugePages_Rsvd may be smaller than your total combined SGA as
instances allocate pages dynamically and proactively as needed.

If you have Oracle Database 11g or later, the default database created uses the Automatic Memory
Management (AMM) feature which is incompatible with HugePages. Disable AMM before proceeding.
To disable, set the initialization parameters MEMORY_TARGET and MEMORY_MAX_TARGET to 0 (zero).

Starting in 11.2.0.2, setting the database initialization parameter USE_LARGE_PAGES=ONLY on all


instances prevents any instance from starting unless sufficient HugePages are available. NOTE: The value
to use for HugePages must be recalculated if the database parameter SGA_TARGET changes, or
whenever the number of database instances changes. Hugepages can only be used for SGA, so do not
over-allocate.

121
EXADATA MIGRATION
The slide depicts the four important phases to migrate your existing databases to Database Machine:

1. CAPACITY PLANNING:

The biggest Database Machine capacity planning challenge is, understanding the difference between
your existing storage and Exadata storage. To determine the storage requirements, you must
understand the I/O characteristics of your current environment. Collect the size and throughput of your
current system. The key measures of I/O throughput are I/Os per second (IOPS) and megabytes per
second (MBPS). Use the system statistic physical I/O disk bytes to derive the current MBPS for the
system. Use the system statistics physical reads and physical writes to determine the current IOPS for
the system. These statistics are available in an Automatic Workload Repository (AWR) report.

122
After you understand the capacity of your current system, you can determine the appropriate Database
Machine storage configuration. It is important to size both for performance and capacity. Use the
Database Machine performance and capacity metrics published at http://www.oracle.com/exadata to
assist with sizing.

Also, remember that it may be possible to achieve much greater effective throughput by making
efficient use of Exadata Smart Flash Cache and Exadata Hybrid Columnar Compression. The precise
impact of these technologies should be verified by testing.

It is also important to consider failures when planning capacity. Exadata cell and disk failures are
transparently tolerated using Automatic Storage Management (ASM) redundancy.

However, it is best practice to ensure that post-failure I/O capacity is sufficient to meet the redundancy
requirements and performance service levels.

2. CHOOSING THE RIGHT MIGRATION PATH:

 Determine what to migrate:


 Avoid methods that migrate what you will discard.

 Consider the configuration of the source system:


 Source Oracle Database version and platform matters.
 Target system is fixed: 11.2, ASM, and little endian.

 Weigh up the costs and favor methods that facilitate best practices:
 Implementing best practices is important in the long term because your future
performance depends on it.
 ASM AU size of 4 MB can be set only at disk group creation.
 Database extent sizes are set at extent allocation.

Before choosing a migration approach, you should clearly define what you want to migrate. This will
help you to avoid wasted effort, such as migrating data that is not required. Clearly defining the scope of
the migration also helps you to identify the source systems. You need to understand the source systems
because their composition may limit the available migration options.

For example, Database Machine is a little-endian platform, so if you are migrating from a big-endian
platform, some physical migration approaches are not feasible. Also, the use of database features, such
as materialized views or object data types, may impose restrictions on some migration methods.

3. MIGRATION STRATEGIES:
Once you have a good understanding what Exadata is, and how it works, you are ready to start thinking
about how you are going to get your database moved.
Migration strategies fall into two general categories:
 Logical migration
 Physical migration

123
Logical migration involves extracting the data from one database and loading it into another.
Physical migration refers to lifting the database, block by block, from one database server and moving it
to another.

LOGICAL MIGRATION:

Using a logical approach, you can change the database extent size and other physical characteristics of
your database, such as the database character set, which is not possible by using a physical migration
approach.

DATA PUMP:

 If a suitable maintenance window can be accommodated and if the size of the database is not
prohibitively large, use Data Pump to move the data in bulk from the legacy system to Database
Machine. Data Pump is easy to use and provide broad support across different platforms and
database versions.
 Remember that HCC is also supported in Data Pump.
 Also note that when using impdp with database links, Data Pump performs conventional INSERT AS
SELECTs, not direct path inserts. And this means that the inserts will be much slower, generate lots
of redo and undo, and possibly run out of undo space. And more importantly, conventional path IAS
does not compress data with HCC compression (but resorts to regular OLTP compression instead if
HCC is enabled for the table)

LOGICAL STANDBY:

 If your application service-level agreements permit little or no downtime, you can use an Oracle
Data Guard logical standby database to replicate the database on Database Machine, and track and
merge the changes while the source database continues to run. After the configuration is

124
established, the logical standby database (on Database Machine) can be switched to assume the
role of primary database and the original source database can be decommissioned.

 The procedure documented in My Oracle Support note 737460.1 can be used to change physical
storage attributes of the database, such as segment extent sizes. Refer to My Oracle Support note
1085687.1 for information about heterogeneous platform support (and associated limitations) in
the same Data Guard configuration.

STREAMS:
 If your application service-level agreements permit little or no down time, you can also use Oracle
Streams to propagate the data, and to track and merge the changes while the source database
continues to run. For more information about this approach, see Appendix D in Oracle Streams
Concepts and Administration 11g Release 2 (11.2). Streams provide broader source database
platform support than the logical standby database approach.

Note:
Both logical standby database and Streams have additional considerations, which may contradict
their use. Firstly, both approaches are unable to natively handle all Oracle Database data types.
Although there are methods to overcome this limitation, the extra effort required to implement and
maintain these methods should not be overlooked. Also, both approaches will fail to duplicate
NOLOGGING operations that are conducted on the primary database.

PHYSICAL MIGRATION:
Physical database migration, as the name implies, is the process of creating a block-for-block copy of the
source database (or parts of the database) and moving it to Exadata. Physical migration is a much
simpler process than some of the logical migration strategies discussed earlier in this chapter.

As you might expect, it does not allow for any changes to be made to the target database, other than
choosing not to migrate some unnecessary tablespaces. This means that you will not be able to modify
extent sizes for tables and indexes, alter your indexing strategy, implement partitioning, or apply HCC
table compression. All these tasks must be done post-migration.

However, physical migration is the fastest way to migrate your database to Exadata. For all physical
migration strategies, except Transportable Tablespaces (TTS), the new Exadata database starts out as a
single-instance database. Post-migration steps are needed to register the database and all its instances
with Cluster Ready Services (Grid Infrastructure).

125
ASM ONLINE MIGRATION:
This method is applicable only if your database is already using ASM and you do not need to adjust the
ASM AU size. To use this method, you must also be able to connect your current database storage to
Database Machine and migrate the database instances to Database Machine. After migrating the
database instances, migrating the data is very simple; simply add new Exadata-based grid disks to your
ASM disk groups and drop existing disks from your ASM disk groups.

PHYSICAL STANDBY DATABASE:


Create a physical standby database on Database Machine and perform a Data Guard switchover to
migrate the database. See My Oracle Support note 413484.1 for information about heterogeneous
platform support. If the source database version is earlier than 11.2, you will need to use the transient
logical rolling database upgrade feature. My Oracle Support note 413484.1 also contains additional
information about this feature.

TRANSPORTABLE DATABASE:
Use the Transportable Database feature to migrate the entire database. To use this method in
conjunction with Database Machine, the source database must be on a little-endian platform.
See the Platform Migration Using Transportable Database white paper at
http://www.oracle.com/us/solutions/maa-wp-10gr2-platformmigrationtdb-131164.pdf for more
information about Transportable Database.

126
TRANSPORTABLE TABLESPACES:

Use the Transportable Tablespaces feature to migrate tablespaces from your current system to a new
database hosted on Database Machine. This is the only physical database migration method that
provides broad platform support and supports migration from earlier Oracle Database versions.

REDUCING DOWN TIME FOR MIGRATION BY USING TRANSPORTABLE TABLESPACES:

As mentioned previously, Transportable Tablespaces facilitate the physical migration approach that
provides the broadest support for different source platforms and database versions. When you use
Transportable Tablespaces to migrate data between systems that have different endian formats, the
amount of down time required can be substantial because of the file conversion process that must take
place, and the length of this process is directly proportional to the size of the data set being moved.

Oracle has released a new capability called Cross Platform Incremental Backup, which when used in
conjunction with Transportable Tablespaces, significantly reduces the amount of down time required to
migrate data to Exadata Database Machine.

The above slide shows an outline of the new process in comparison with traditional migration using
Transportable Tablespaces. For complete details about this process, refer to My Oracle Support note
1389592.1.
Note that Cross Platform Incremental Backup does not affect the amount of time it takes to perform
other migration actions such as metadata export and import. Therefore, databases that have very large

127
amounts of metadata will see limited benefit if the migration time is dominated by metadata
operations, not data file transfer and conversion.

Also, note that the Cross Platform Incremental Backup functionality is delivered as a patch that requires
Exadata Database Machine running Linux with at least Oracle Database version 11.2.0.2 and Bundle
Patch 12 for Exadata installed.

OTHER APPROACHES:

 Data Integration Tools


 Oracle Golden-Gate
 Oracle Data Integrator
 Custom Code
 Query over database link
 PL/SQL routines
 Hybrid Approaches
 For example, use Transportable Tablespaces to migrate data from the current production
database to a staging database outside Database Machine, and then use Data Pump to
unload data from the staging database and load it into Database Machine.

4. POST MIGRATION TASKS :


Consider the following after migrating databases to Database Machine:

 One of ASM’s core functions is to ensure that data is evenly distributed across all disks in a disk
group. This happens automatically. However, occasionally, a disk group may become
imbalanced due to uncommon errors, such as a failed rebalance. It is, therefore, an operational
best practice to check disk groups on a regular basis and run a manual rebalance if necessary. A
script is available to check disk group balance in My Oracle Support note 367445.1. Also,
Enterprise Manager Grid Control displays an alert if a disk group becomes unbalanced beyond a
customizable threshold.
 The superior scan rates available from Database Machine make it possible that indexes,
previously required for good performance, are no longer required. You should assess execution
plans that use indexes to see if they would run acceptably with Smart Scans. To determine if
queries would perform acceptably without an index, you can make the index invisible to the
optimizer. An invisible index is maintained by DML operations, but it is not used by the optimizer
for queries. To make an index invisible, use the following command:
ALTER INDEX <index_name> INVISIBLE;
 After you perform the preceding tasks, you can configure I/O Resource Management (IORM).

128
EXADATA BACKUP & RESTORE
COMPONENTS NEED TO BE BACKED UP:

1. Compute Servers
2. Storage Servers
3. InfiniBand Switches

1. COMPUTE SERVERS (Db Servers):

WHEN TO BACK UP:

A backup should be made before and after every significant change to the software on the database
server. Example, a backup should be made before and after the following procedures:

 Application of operating system patches


 Application of Oracle patches
 Reconfiguration of significant operating parameters
 Installation or reconfiguration of significant non Oracle software

WHERE TO BACK UP:

You can use the available space of the current NFS of Compute Server. 150 GB space is adequate.

HOW TO BACK UP:

No specific Tools will be used. Logical Volume Management (LVM) feature is used to perform backup of
Compute node software.

BACKING UP THE COMPUTE SERVERS USING LVM SNAPSHOTS

Use available capacity in your compute node’s local SAS disks and logical volume management features
to perform backups of your compute node software.

Display available space in your current compute node volume groups

[root@exa01dbadm02 ~]# vgdisplay | egrep '(VGExaDb|Alloc PE|Free PE)'


VG Name VGExaDb
Alloc PE / Size 47104 / 184.00 GB
Free PE / Size 381205 / 1.45 TB

[root@exa01dbadm02 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/VGExaDb-LVDbSys1
30G 16G 13G 57% /

129
/dev/sda1 496M 40M 431M 9% /boot
/dev/mapper/VGExaDb-LVDbOra1
99G 13G 81G 14% /u01
tmpfs 252G 267M 252G 1% /dev/shm

The /u01 file system on the Exadata Compute Nodes is created on a 99 GB logical volume. The
underlying volume group for this logical volume is used for both the /u01 and root file system, as well as
a swap partition, and is 1.45 TB in size. This means that you will typically have a large amount of
unallocated disk storage to use for either extending the default volumes or creating new logical
volumes.

Create an additional logical volume to store your backups

Since we want to back up our root and /u01 file systems, which currently use 184 GB of space from the
VGExaDb volume group, we will create a new logical volume in this volume group. In this example, we’re
calling this backup:

[root@exa01dbadm02 ~]# lvcreate -L 184G -n /dev/VGExaDb/backup


Logical volume "backup" created

When your backup volume group is created, format it using mkfs.ext3 using a 4 KB block size with the
following command:

[root@exa01dbadm02 ~]# mkfs.ext3 -m 0 -b 4096 /dev/VGExaDb/backup


mke2fs 1.39 (29-May-2006)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
24117248 inodes, 48234496 blocks
0 blocks (0.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
1472 block groups
32768 blocks per group, 32768 fragments per group
16384 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872

Writing inode tables: done


Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 21 mounts or


180 days, whichever comes first. Use tune2fs -c or -i to override.

130
Create a file system on your backup logical volume and mount the file system

After your file system is built on your backup logical volume, create a directory to mount it to, mount
the file system, and validate its capacity:

[root@exa01dbadm02 ~]# mkdir -p /mnt/backup


[root@exa01dbadm02 ~]# mount /dev/VGExaDb/backup /mnt/backup
[root@exa01dbadm02 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/VGExaDb-LVDbSys1
30G 16G 13G 57% /
/dev/sda1 496M 40M 431M 9% /boot
/dev/mapper/VGExaDb-LVDbOra1
99G 13G 81G 14% /u01
tmpfs 252G 267M 252G 1% /dev/shm
/dev/mapper/VGExaDb-backup
182G 188M 181G 1% /mnt/backup

To prepare for your LVM snapshots, query your current logical volume names and note the logical
volumes used for your / and /u01 file systems.
In the following output, these are /dev/VGExaDb/LVDbSys1 and /dev/VGExaDb/LVDbOra1, respectively:

[root@exa01dbadm02 ~]# lvdisplay | grep "LV Name"


LV Name /dev/VGExaDb/LVDbSys1
LV Name /dev/VGExaDb/LVDbSys2
LV Name /dev/VGExaDb/LVDbSwap1
LV Name /dev/VGExaDb/LVDbOra1
LV Name /dev/VGExaDb/backup

Create LVM snapshots

Now, create LVM snapshots on the /dev/VGExaDb/LVDbSys1 and /dev/VGExaDb/LVDbOra1 volumes


using lvcreate and the –s option, a size of 1 GB and 5 GB, respectively, and name the root_snap and
u01_snap:

[root@exa01dbadm02 ~]# lvcreate -L1G -s -n root_snap /dev/VGExaDb/LVDbSys1


Logical volume "root_snap" created
[root@exa01dbadm02 ~]# lvcreate -L5G -s -n u01_snap /dev/VGExaDb/LVDbOra1
Logical volume "u01_snap" created
[root@exa01dbadm02 ~]# lvdisplay | grep "LV Name"
LV Name /dev/VGExaDb/LVDbSys1
LV Name /dev/VGExaDb/LVDbSys2
LV Name /dev/VGExaDb/LVDbSwap1
LV Name /dev/VGExaDb/LVDbOra1

131
LV Name /dev/VGExaDb/backup
LV Name /dev/VGExaDb/root_snap
LV Name /dev/VGExaDb/u01_snap

You can confirm your snapshot volume devices have been created by listing the contents in
/dev/VGExaDb:

[root@exa01dbadm02 ~]# ls /dev/VGExaDb/*snap*


/dev/VGExaDb/root_snap /dev/VGExaDb/u01_snap

Mount your LVM snapshot volume.

Create directories to mount first, and then use mount to mount the snapshot volumes:

root@exa01dbadm02 ~]# mkdir -p /mnt/snap/root


root@exa01dbadm02 ~]# mkdir -p /mnt/snap/u01
root@exa01dbadm02 ~]# mount /dev/VGExaDb/root_snap /mnt/snap/root
root@exa01dbadm02 ~]# mount /dev/VGExaDb/u01_snap /mnt/snap/u01
[root@exa01dbadm02 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/VGExaDb-LVDbSys1
30G 16G 13G 56% /
/dev/sda1 496M 40M 431M 9% /boot
/dev/mapper/VGExaDb-LVDbOra1
99G 13G 81G 14% /u01
tmpfs 252G 267M 252G 1% /dev/shm
/dev/mapper/VGExaDb-backup
182G 188M 181G 1% /mnt/backup
/dev/mapper/VGExaDb-root_snap
30G 16G 13G 56% /mnt/snap/root
/dev/mapper/VGExaDb-u01_snap
99G 13G 81G 14% /mnt/snap/u01

Notice that the available and used capacity is the same for each set of snapshot volumes with respect to
their source volumes.

Back up your file systems from the LVM snapshot.

You can back up the contents of your system using the point-in-time snapshots of your / and /u01
volumes. You can use whichever backup mechanism you want to back these files up, Here we created a
backup logical volume and /mnt/backup file system for these purposes. In the following code, we will
use tar to create our backups from the snapshot volumes:

[root@exa01dbadm02 ~]# cd /mnt/snap


[root@exa01dbadm02 snap]# tar -pjcvf /mnt/backup/exa01dbadm02.t.bz2 * /boot --exclude
/mnt/backup/exa01dbadm02.t.bz2 /tmp/exa01dbadm02.stdout 2> /tmp/exa01dbadm02.stderr

132
Check the /tmp/backup_tar.stderr file for any significant errors. Errors about failing to tar open sockets,
and other similar errors, can be ignored.

Unmount the snapshot file systems


When your backups are complete, Unmount the snapshot file systems and destroy your snapshot
volumes:
[root@exa01dbadm02 snap]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/VGExaDb-LVDbSys1
30G 16G 13G 57% /
/dev/sda1 496M 40M 431M 9% /boot
/dev/mapper/VGExaDb-LVDbOra1
99G 13G 81G 14% /u01
tmpfs 252G 267M 252G 1% /dev/shm
/dev/mapper/VGExaDb-backup
182G 16G 166G 9% /mnt/backup
/dev/mapper/VGExaDb-root_snap
30G 16G 13G 56% /mnt/snap/root
/dev/mapper/VGExaDb-u01_snap
99G 13G 81G 14% /mnt/snap/u01

[root@exa01dbadm02 snap]# cd /mnt/snap/u01/


[root@exa01dbadm02 u01]# ls
app crashfiles lost+found
[root@exa01dbadm02 u01]# cd
[root@exa01dbadm02 ~]# umount /mnt/snap/root
[root@exa01dbadm02 ~]# umount /mnt/snap/u01
[root@exa01dbadm02 ~]# rm -rf /mnt/snap
[root@exa01dbadm02 ~]# lvremove /dev/VGExaDb/root_snap
Do you really want to remove active logical volume root_snap? [y/n]: y
Logical volume "root_snap" successfully removed
[root@exa01dbadm02 ~]# lvremove /dev/VGExaDb/u01_snap
Do you really want to remove active logical volume u01_snap? [y/n]: y
Logical volume "u01_snap" successfully removed

[root@exa01dbadm02 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/VGExaDb-LVDbSys1
30G 16G 13G 57% /
/dev/sda1 496M 40M 431M 9% /boot
/dev/mapper/VGExaDb-LVDbOra1
99G 13G 81G 14% /u01
tmpfs 252G 267M 252G 1% /dev/shm
/dev/mapper/VGExaDb-backup
182G 16G 166G 9% /mnt/backup
[root@exa01dbadm02 ~]#

133
2. STORAGE SERVERS (Cell Servers):

WHEN TO BACK UP:

Oracle automatically performs backups of the operating system and cell software on each Exadata
Storage Server. The contents of the system volumes are automatically backed up and require no Oracle
DMA intervention or operational processes.

WHERE TO BACK UP:

The CeLLBoot USB Flash drive is installed on every Exadata Storage Server and is used to store a
bootable backup image of storage cell, complete with latest valid configurations.

Oracle maintains copies of the latest cell boot images and Cell Server software in
/opt/oracle.cellos/iso/lastGoodConfig.

HOW TO BACK UP:

Oracle assumes responsibility for backing up the critical files of the storage cells to an internal USB drive
called the CELLBOOT USB Flash Drive.

BACKING UP THE STORAGE SERVERS

Oracle automatically performs backups of the operating system and cell software on each Exadata
Storage Server. The contents of the system volumes are automatically backed up and require no Oracle
DMA intervention or operational processes.

Oracle assumes responsibility for backing up the critical files of the storage cells to an internal USB drive
called the CELLBOOT USB Flash Drive. You can validate the contents of your /opt/oracle.cellos/iso and
CELLBOOT USB Flash Drive as detailed below:

[root@exa01celadm01 ~]# cd /opt/oracle.cellos/iso


[root@exa01celadm01 iso]# ls

boot.cat isolinux.cfg
boot.msg lastGoodConfig
cellbits memtest
image.id splash.lss
imgboot.lst trans.tbl
initrd-2.6.39-400.128.17.el5uek.img vmlinuz
initrd.img vmlinuz-2.6.39-400.128.17.el5uek
isolinux.bin

134
DISPLAYING THE CONTENTS OF CELLBOOT USB FLASH DRIVE

1. Log in to one of your storage servers as root and run fdisk –l to find your internal USB drive
partition. At the time of this writing, the size of the internal USB drive is 4009 MB.

[root@exa01celadm01 ~]# fdisk -l 2>/dev/null

Disk /dev/sdm: 4009 MB, 4009754624 bytes


126 heads, 22 sectors/track, 2825 cylinders
Units = cylinders of 2772 * 512 = 1419264 bytes

Device Boot Start End Blocks Id System


/dev/sdm1 1 2824 3914053 83 Linux

2. Create or validate a directory to mount the /dev/sdm1 partition to. Typically, this would be
/mnt/usb, but for purposes of validating the contents of the CELLBOOT USB Flash Drive, you can
mount this to a directory of your choice. The output below shows we have a /mnt/usb directory,
which is currently not mounted.

[root@exa01celadm01 ~]# ls /mnt


dev usb usb.image.info usb.make.cellboot usb.qa usb.saved.cellos

[root@exa01celadm01 ~]# mount


/dev/md5 on / type ext3 (rw,usrquota,grpquota)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
tmpfs on /dev/shm type tmpfs (rw)
/dev/md7 on /opt/oracle type ext3 (rw,nodev)
/dev/md4 on /boot type ext3 (rw,nodev)
/dev/md11 on /var/log/oracle type ext3 (rw,nodev)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
[root@exa01celadm01 ~]#

_______________________________________________________________________________
NOTE: Oracle uses the /mnt/usb directory to perform its automatic CELLBOOT USB Flash Drive backups.

3. Now, mount your USB drive and validate:

[root@exa01celadm01 ~]# mount /dev/sdm1 /mnt/usb


[root@exa01celadm01 ~]# df -k /mnt/usb
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sdm1 3852548 1685232 1971616 47% /mnt/usb

4. You can check the contents of the CELLBOOT USB Flash Drive as listed below.

135
[root@exa01celadm01 ~]# ls /mnt/usb
boot.cat isolinux.cfg
boot.msg kernel.ver
cellbits lastGoodConfig
grub log
I_am_CELLBOOT_usb lost+found
image.id memtest
imgboot.lst splash.lss
initrd-2.6.32-300.19.1.el5uek.img trans.tbl
initrd-2.6.39-400.128.17.el5uek.img vmlinuz
initrd.img vmlinuz-2.6.32-300.19.1.el5uek
isolinux.bin vmlinuz-2.6.39-400.128.17.el5uek
[root@exa01celadm01 ~]#

Oracle automatically backs up the system volumes on the internal CELLBOOT USB Flash Drive. This drive
is typically built on partition /dev/sdm1, which can be mounted to /mnt/usb or a mount directory of
your choice. Oracle will boot to this CELLBOOT USB Flash Drive in the event of loss or corruption of the
system volume partitions.

CREATING A CELL BOOT IMAGE ON AN EXTERNAL USB DRIVE

Oracle provides a utility to create a bootable rescue image for your storage cells using an external USB
drive. To create your external USB bootable recovery image:

 First, locate or purchase an unformatted USB drive and put it into either of the empty USB slots in
the front of your storage server. The front panel in your Exadata storage cell has two USB slots;
make sure you only have one external device plugged in.
 The device should appear as /dev/sdad; log in as root and run the following fdisk command to
validate this:

[root@exa01celadm01 ~]# fdisk -l /dev/sdad


Disk /dev/sdad: 8166 MB, 8166703104 bytes
224 heads, 63 sectors/track, 1130 cylinders
Units = cylinders of 14112 * 512 = 7225344 bytes
Device Boot Start End Blocks Id System
/dev/sdad1 1 1131 7975268 c W95 FAT32 (LBA)

1. Create a partition on your USB drive, create an EXT3 file system on your partition, and label your
volume

[root@exa01celadm01 ~]# mkfs -t ext3 /dev/sdad1


--- output truncated…..
This filesystem will be automatically checked every 37 mounts or
180 days, whichever comes first. Use tune2fs -c or -i to override.

136
2. Label the USB drive as CELLBOOTEXTERNAL

[root@exa01celadm01 ~]# e2label /dev/sdad1 CELLBOOTEXTERNAL


[root@exa01celadm01 ~]# e2label /dev/sdad1
CELLBOOTEXTERNAL

3. Create a CELLBOOT image on the external USB drive. The make_cellboot_usb drive will create a
boot image for your storage cell, identical to that contained on the internal CELLBOOT USB drive.

[root@exa01celadm01 ~]# cd /opt/oracle.SupportTools


[root@exa01celadm01 oracle.SupportTools]# ./make_cellboot_usb -verbose -force
--- output truncated…..
succeeded
Running "install /grub/stage1 (hd0) (hd0)1+16 p (hd0,0)/grub/stage2 /grub/grub.conf"... succeeded
Done.
grub>
[root@exa01celadm01 oracle.SupportTools]#

When finished, unmount your USB drive, remove it from your storage cell, and keep it in a safe place.

CHECK THE CONTENTS OF THE EXTERNAL USB DRIVE

 Unmount the internal USB

[root@exa01celadm01 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/md5 9.9G 5.6G 3.9G 60% /
tmpfs 48G 0 48G 0% /dev/shm
/dev/md7 3.0G 802M 2.1G 28% /opt/oracle
/dev/md4 114M 30M 79M 28% /boot
/dev/md11 5.0G 245M 4.5G 6% /var/log/oracle
/dev/sdm1 3.7G 1.7G 1.9G 47% /mnt/usb
[root@exa01celadm01 ~]# umount /mnt/usb

 Mount the External usb as /mnt/usb & check the contents

[root@exa01celadm01 ~]# mount /dev/sdad1 /mnt/usb


[root@exa01celadm01 ~]# ls /mnt/usb
boot.cat initrd-2.6.32-300.19.1.el5uek.img lost+found boot.msg imgboot.lst
initrd-2.6.39-400.128.17.el5uek.img memtest cellbits initrd.img splash.lss grub
isolinux.bin trans.tbl I_am_CELLBOOT_usb isolinux.cfg vmlinuz image.id
kernel.ver vmlinuz-2.6.32-300.19.1.el5uek lastGoodConfig vmlinuz-2.6.39-400.128.17.el5uek

137
[root@exa01celadm01 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/md5 9.9G 5.6G 3.9G 60% /
tmpfs 48G 0 48G 0% /dev/shm
/dev/md7 3.0G 802M 2.1G 28% /opt/oracle
/dev/md4 114M 30M 79M 28% /boot
/dev/md11 5.0G 245M 4.5G 6% /var/log/oracle
/dev/sdad1 7.5G 1.7G 5.4G 24% /mnt/usb

 Unmount the External usb & mount the internal usb again

[root@exa01celadm01 ~]# umount /mnt/usb


[root@exa01celadm01 ~]# mount /dev/sdm1 /mnt/usb
[root@exa01celadm01 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/md5 9.9G 5.6G 3.9G 60% /
tmpfs 48G 0 48G 0% /dev/shm
/dev/md7 3.0G 802M 2.1G 28% /opt/oracle
/dev/md4 114M 30M 79M 28% /boot
/dev/md11 5.0G 245M 4.5G 6% /var/log/oracle
/dev/sdm1 3.7G 1.7G 1.9G 47% /mnt/usb

3. INFINIBAND SWITCHES (IB Switches):

WHEN TO BACK UP:

 Before and After the Firmware Patch or Upgradation


 Add, Delete or Change to any Network

WHERE TO BACK UP:

When the backup is complete, a config_backup.xml file will be downloaded to the browser or file
transfer location. Once your configuration is downloaded, you can back it up using the backup method
or software of your choice.

HOW TO BACK UP:

Backing up InfiniBand switches is accomplished by using the Web ILOM interface.


Additionally periodically back up your OpenSM configuration file, /etc/opensm/opensm.conf.
OpenSM is InfiniBand’s subnet manager software.

BACKING UP THE INFINIBAND SWITCHES

1. Logon to IB switch ILOM interface as root user:

138
When the backup is complete, a config_backup.xml file will be downloaded to your browser or file
transfer location. Once your configuration is downloaded, you can back it up using the backup method
or software of your choice.

2. In addition to backing up your InfiniBand switch configuration via the ILOM, you should also plan on
periodically backing up your OpenSM configuration file, /etc/opensm/opensm.conf.

OpenSM is InfiniBand’s subnet manager software.

login as: root


root@10.xxx.xx.205's password:
Last login: Fri Jan 23 09:12:46 2015 from 65.xx.xxx.130
[root@exa01sw-iba01 ~]# cd /etc/opensm/
[root@exa01sw-iba01 opensm]# ls -ltr
total 24
-rw-r--r-- 1 root root 1872 Aug 28 2013 qos-policy.conf
-rw-r--r-- 1 root root 83 Aug 28 2013 osm-remote-sm.conf
-rw-r--r-- 1 root root 84 Aug 28 2013 osm-port-trust.conf
-rw-r--r-- 1 root root 9662 Aug 28 2013 opensmdefault.conf
-rw-r--r-- 1 root root 9663 Sep 19 12:27 opensm.conf
[root@exa01sw-iba01 opensm]#

3. Copy both files /etc/opensm/opensm.conf & config_backup.xml to a secure location.

139
RESTORING DATABASE SERVER FROM BACKUP:
1. Prepare NFS server to host the backup archive mybackup.tar.bz2. The NFS server must be accessible
by IP address. For example, on an NFS server with the IP address nfs_ip, where the directory /export
is exported from NSF mounts, put the mybackup.tar.bz2 file in the /export directory.

2. Attach the /opt/oracle.SupportTools/diagnostics.iso from any healthy database server as virtual


media to the ILOM of the database server to be restored.
The following is an example of how to set up a virtual CD-ROM using the ILOM interface:
a. Copy the diagostic.iso file to a directory on the machine using the ILOM interface.
b. Log in to the ILOM web interface.
c. Select Remote Console from the Remote Control tab. This will start the console.
d. Select the Devices menu.
e. Select the CD-ROM image option.
f. Navigate to the location of the diagnostic.iso file.
g. Open the diagnostic.iso file.
h. Select Host Control from the Remote Control tab.
i. Select CDROM as the next boot device from the list of values.
j. Click Save. When the system is booted, the diagnostic.iso image is used.
3. Restart the system from the iso by choosing the CD-ROM as boot device during startup.
4. Answer as follows when prompted by the system. The responses are shown in bold.
Choose from following by typing letter in '()':
(e)nter interactive diagnostics shell. Must use credentials from Oracle support to login (reboot or
power cycle to exit the shell),
(r)estore system from NFS backup archive,
Select:r
Are you sure (y/n) [n]:y

The backup file could be created either from LVM or non-LVM based compute node versions below
11.2.1.3.1 and 11.2.2.1.0 or higher do not support LVM based partitioning use LVM based scheme(y/n):y

Enter path to the backup file on the NFS server in format:


<ip_address_of_the_NFS_share>:/<path>/<archive_file>

For example, 10.10.10.10:/export/operating_system.tar.bz2


NFS line:nfs_ip:/export/mybackup.tar.bz2
IP Address of this host:<IP address of the DB host>
Netmask of this host:<netmask for the above IP address>
Default gateway:<Gateway for the above IP address>

When the recovery completes, the login screen appears.


5. Log in as the root user. You will need the password. Contact Oracle Support Services for the
password.
6. Detach the diagnostics.iso.
7. Use the reboot command to restart the system. The restoration process is complete.

140
BARE METAL RESTORE PROCEDURE:
When you do not have a snapshot based backup to restore, then use this procedure to recover the
compute node. This procedure can be done using a USB or using ISO.
Below are the high level steps for BMR using ISO image.

1. Remove the problematic node from the cluster by running the commands from surviving compute
node.
2. Download the correct ISO image for your DB machine from Oracle eDelivery --
https://edelivery.oracle.com/
3. Copy the downloaded file to the surviving compute node.
4. unzip and prepare the ISO for Bare Metal Restore :
# unzip V36291-01.zip
# tar -pxvf computeImageMaker_11.2.3.2.1_LINUX.X64_130109-1.x86_64.tar

Remove the original zip & tarball after extracting as this is no longer required.
# rm V36291-01.zip
#rm computeImageMaker_11.2.3.2.1_LINUX.X64_130109-1.x86_64.tar
# rm README.txt
# cd dl360/
# ./makeImageMedia.sh image.iso
5. Copy the above generated image.iso to the machine from which ILOM console will be launched.
6. Log into ILOM via web and enable remote console
a. Attach the ISO image to the CD ROM
b. Connect to the ILOM via web interface. Go to Remote Control tab, then Host Control tab. From
the Next Boot Device, select CDROM. Next the server is rebooted; it will use the ISO image
attached. This is valid for one time, which after the default BIOS order settings will remain.
c. Reboot the box and let the process pick the ISO image and start the re-image process
d. The system boots and should detect the ISO image media. Allow the system to boot.
i. The first phase of the imaging process identifies any BIOS or Firmware that is out of date,
and upgrades the components to the expected level for the Image Maker. If any component
must to be upgraded (or downgraded), then the system is automatically rebooted.
ii. The second phase of the imaging process will install the factory image on to the
replacement database server.
e. Reboot the server again to boot from the hard disk.
f. Next you will be asked to provide below information during the boot process itself to configure
the network settings:
Name servers
Time zone (for example, America/Chicago)
NTP servers
IP Address information for Management Network
IP Address information for Client Access Network
IP Address information for Infiniband Network
The canonical host name
The default gateway

141
7. Once the server is up,create the necessary users and group at OS level and configure the important
files like cellinit.ora,cellip.ora etc.
8. Add the node back to the cluster.
Refer to the attachment in Doc Id: 1084360.1 for the full procedure.

STORAGE SERVER RESCUE:


There is no need to take a regular backup of the Exadata cell nodes, because the OS and Exadata S/W
are stored in first two disks using S/W RAID. So, even if one disk fails, the node can survive using the
other disk.
Even if both the disks fail at the same time, there is already an internal USB attached to the cell node,
which can be used to recover the OS and Exadata S/W.

USB RESCUE:
Below are the steps for performing a USB rescue if both the system disks are corrupted:

1. Connect to the Exadata Storage Server by using the ILOM console.


2. Boot the cell, and as soon as you see the Oracle Exadata splash screen, press any key on the
keyboard.
3. From the boot options list, select the last option,
CELL_USB_BOOT_CELLBOOT_usb_in_rescue_mode. Then press Enter.

4. Select the rescue option, and proceed with the rescue.


5. At the end of the rescue process, ensure that the cell boots from the system disks.
6. Reconfigure the cell.

The rescue procedure does not destroy the contents of the data disks or the contents of the data
partitions on the system disks unless you explicitly choose to do so during the rescue procedure.

142
HEALTH CHECKS
EXACHK:
EXACHK IS A UTILITY THAT:
 Collects data regarding Database Machine component versions and best practices
 Enables administrators to check their environment against supported version levels and best
practices
 Is pre-installed on new Exadata Database Machines
 Is available from My Oracle Support note 1070954.1
 Should be executed periodically as a regular part of Database Machine monitoring and maintenance
 Does not alter any Database Machine configuration settings
 Is lightweight and has minimal impact on the system

EXACHK IS NOT:
 A continuous monitoring tool
 A replacement for Enterprise Manager

Exachk collects data regarding key software, hardware, and firmware versions and configuration best
practices that are specific to Oracle Database Machine.
The output assists Database Machine administrators to periodically review and cross-reference the
current data for key Database Machine components against supported version levels and
recommended best practices.

Exachk is preinstalled on new Exadata Database Machines. The latest updated version of Exachk is
available for download from My Oracle Support note 1070954.1. It can be executed as desired and
should be executed regularly as part of the maintenance program for an Oracle Database Machine.
It will also offer to configure SSH user equivalence if it is not configured.

Execution Models:
Beginning with 12.1.0.2.1, there are two execution models available for Exachk.
1. Execute as "root" Userid
In this model, the "root" Userid launches a complete Exachk run. The Exachk process running under the
"root" Userid uses the "su" command to execute commands as one of the lower privileged owners of
the RDBMS or grid homes. This approach has advantages in environments with role separation or more
restrictive security models.

NOTE: Beginning with version 12.1.0.2.2, it is recommended to execute Exachk as the "root" userid.
2. Execute as RDBMS or GRID Home Owner Userid
In this model, Exachk is launched from either the RDBMS or GRID home owner Userid. The Userid that
launched Exachk must be able to switch to the "root" Userid to run the "root" level checks. This can be
accomplished by providing the "root" Userid password, setting up "sudo", or by pre-configured
password less SSH connectivity. This model requires multiple runs in role separated environments, and
more restrictive security requirements may not permit the required "root" Userid access.

143
INSTALLATION & EXECUTION:

 As the chosen installation owner userid, transfer the exachk bundle from My Oracle Support note
1070954.1 to “/opt/oracle.SupportTools/exachk”
 Unzip the bundle :
unzip exachk_121024_bundle.zip
 Unzip exachk.zip.
 Verify md5sums of key files using the commands and values in the provided md5sums.txt file.
 Ensure that the exachk script is executable:
chmod +x exachk
 Run exachk:
– Follow the prompts, and read and understand all the messages.
– Supply the requested passwords; otherwise, some checks are skipped.
 Review the report:
– Review the summary to identify areas for further investigation.
– Review the details for recommendations and further information.
 The following usage considerations should also be noted:

– It is recommended that Exachk should be staged and operated from a local file system on a
single database server to deliver the best performance.
– To maximize the number of checks that are performed, execute Exachk when the Grid
infrastructure services and at least one database are up and running.
– Although Exachk is a minimal impact tool, it is best practice to execute it during times of least
load on the system.
– To avoid possible problems associated with running the tool from terminal sessions on a
network-attached computer, consider running the tool using VNC so that if here is a network
interruption, the tool will continue to execute to completion.
– If the execution of the tool fails for some reason, it can be rerun from the beginning; Exachk
does not resume from the point of failure.

EXACHK OUTPUT:

Exachk produces an HTML report of findings, with the most important exceptions listed first by
component. The report is contained in a date-stamped zip file that is produced during each Exachk run.
Typically, administrators will transfer the Exachk output ZIP file to a desktop computer and expand it
there in order to review the Exachk findings.

The Exachk HTML report contains a summary of findings, along with relevant details. The detailed
entries typically describe the issues in detail and provide recommendations for remediation. A link to
documentation or a link to a note in My Oracle Support may also be included to provide further
information, especially if the recommendations are not straightforward.

144
145
EXAMPLE:
Status Type Message Status On Details

Storage Server One or more storage server has non-test stateless


FAIL exa01celadm03 View
Check alerts with null "examinedby" fields.

Storage Server Storage Server alerts are not configured to be sent via
FAIL All Storage Servers View
Check email

Storage Server One or more storage servers have stateful alerts that
FAIL All Storage Servers View
Check have not been cleared.

There are two types of alerts maintained in the alerthistory of a


storage server, stateful and stateless.

A stateful alert is usually associated with a transient condition,


and it will clear itself after that transient condition is
corrected. These alerts age out of the alerthistory after 7 days
(default time) once they are set to clear. Stateful alerts are
not examined by this check.

A stateless alert is not cleared automatically. They will not age


out of the alerthistory until the alert is examined and the
"examinedby" field set manually to a non-null value, typically the
name of the person who reviewed the stateless alert and corrected
or otherwise acted upon the information provided.

Recommendation The corrective action for each stateless alert is found in the
"alertaction" field. The listing below contains the name,
alerttype, severity, alertmessage, and alertaction fields for each
alert found with "examinedby" set to null. Follow the
recommendations in the "alertaction" field and when the issue is
resolved, manually set the "examinedby" field with a command
similar to the following (celladmin userid, cellcli utility):

CellCLI> alter alerthistory 1640 examinedby="jdoe"


Alert 1640 successfully altered

Where jdoe is the name of the person who verified the cause of the
stateless alert no longer exists, and the number is the name of
the stateless alert. Note that double quotes are used around the
value to be set, but not the name of the stateless alert.

Needs attention
exa01celadm03
on

Passed on exa01celadm02, exa01celadm01

146
SUNDIAG:
Sundiag report is used to collect diagnostic information required for disk failures on the Exadata Storage
servers and could be used on Db nodes or Storage Servers for some other hardware issues.
There is a script provided for this,which needs to be executed as root on the Exadata Storage server
having disk problems and sometimes also on Db nodes or Storage Servers for some other hardware
issues.
Sundiag.sh is included in the Exadata base image in /opt/oracle.SupportTools, however if the image is
older, then it may not be the latest sundiag.sh version.
The latest version of the script can be downloaded from Doc Id :- 761868.1.

USAGE:
[root@exa01dbadm01 ~]# /opt/oracle.SupportTools/sundiag.sh -h
Oracle Exadata Database Machine - Diagnostics Collection Tool
Version: 1.5.1_20140521
Usage: /opt/oracle.SupportTools/sundiag.sh [osw] [ilom | snapshot]
osw - Copy ExaWatcher or OSWatcher log files (Can be several 100MB's)
ilom - User level ILOM data gathering option via ipmitool, in place of separately using root login to
get ILOM snapshot over the network.
snapshot - Collects node ILOM snapshot- requires host root password for ILOM to send snapshot data
over the network.

FOR GATHERING OS WATCHER DATA ALONGSIDE SUNDIAG:


# /opt/oracle.SupportTools/sundiag.sh osw
Execution will create a date stamped tar.bz2 file in /tmp/sundiag_/tar.bz2 including OS Watcher archive
logs. These logs may be very large.
Upload this file to the Service Request.

FOR GATHERING ILOM DATA ALONGSIDE SUNDIAG:


# /opt/oracle.SupportTools/sundiag.sh snapshot
Execution will create a date stamped tar.bz2 file in /tmp/sundiag_/tar.bz2 which includes running an
ILOM snapshot. In order to collect a snapshot, the host (not ILOM) 'root' password is required to
facilitate network transfer of the snapshot into the /tmp directory. This is the preferred method of
ILOM data collection.

FOR GATHERING SUNDIAG DATA ACROSS A WHOLE RACK:


For gathering sundiag.sh outputs across a whole rack using dcli, the outputs may end up with the same
tarball name which will overwrite each other upon unzipping. To avoid this, use the following from
DB01:
1. [root@exadb01 ~]# cd /opt/oracle.SupportTools/onecommand (or wherever the all_group file is with
the list of the rack hostnames)
2. [root@exadb01 onecommand]# dcli -g all_group -l root /opt/oracle.SupportTools/sundiag.sh 2>&1
<this will take up to about 2 minutes>
3. Verify there is output in /tmp on each node:
[root@exadb01 onecommand]# dcli -g all_group -l root --serial 'ls -l /tmp/sundiag* '

147
4. Sort them by hostname into directories, as they will likely mostly have the same filename with the
same date stamp:
[root@exadb01 onecommand]# for H in `cat all_group`; do mkdir /root/rack-sundiag/$H ; scp -p
$H:/tmp/sundiag*.tar.bz2 /root/rack-sundiag/$H ; done

5. [root@exadb01 onecommand]# cd /root/rack-sundiag

6. [root@exadb01 ~]# ls exa*


exacel01:
sundiag_2011_05_24_10_11.tar.bz2
exacel02:
sundiag_2011_05_24_10_11.tar.bz2
...
exadb08:
sundiag_2011_05_24_10_11.tar.bz2

7. [root@exadb01 ~]# tar jcvf exa_rack_sundiag_oracle.tar.bz2 exa*


exacel01/
exacel01/sundiag_2011_05_24_10_11.tar.bz2
exacel02/
exacel02/sundiag_2011_05_24_10_11.tar.bz2
...
exadb08/
exadb08/sundiag_2011_05_24_10_11.tar.bz2

8. [root@exadb01 ~]# ls -l exa_rack_sundiag_oracle.tar.bz2


-rw-r--r-- 1 root root 3636112 May 24 10:21 exa_rack_oracle.tar.bz2

Upload this file to the Service Request.

VALIDATE THE SCRIPT IS RUNNING THE NEW VERSION:

148
EXAWATCHER:

ExaWatcher replaces OSWatcher in Exadata software versions 11.2.3.3 and up.


ExaWatcher resides on both, the Exadata database servers (or compute nodes) and storage cell servers
under “/opt/oracle.ExaWatcher/”

The ExaWatcher utility is started automatically during boot time. If you need to manually stop and start
ExaWatcher you can do the following:

TO STOP EXAWATCHER:
# ./ExaWatcher.sh –stop

TO START EXAWATCHER:
# ./opt/oracle.cellos/vldrun –script oswatcher

TO VERIFY EXAWATCHER IS RUNNING:


# ps –ef | grep -i ExaWatcher

MAINTENANCE:

The log file retention which ExaWatcher uses is different than it was for OSW in the past. ExaWatcher
Cleanup is a maintenance module used to keep the space used for ExaWatcher under a certain amount.
It will check the current size of the space used by ExaWatcher, if it's more than the set limit (3GB for
database servers & 600MB for cells), old data files will be deleted.
The space management logic is found in script /opt/oracle.ExaWatcher/ExaWatcherCleanup.sh
ExaWatcherCleanup.sh is responsible for:
 Cleaning up older ExaWatcher logs when the quota fails below 80%.
 It also adjusts the logic to free 20% of the space limit.

Options when needing to adjust the quota for ExaWatcher logs:


1) One-time cleanup for ExaWatcher logs:

149
2) Permanently change the quota on the *DB Node* only as no configuration changes may be made
to the storage cells:

EXAWATCHER MANUAL FILE COLLECTION


The GetExaWatcherResults.sh script can be used to manually collect files in ExaWatcher. The location of
the script is in /opt/oracle.ExaWatcher/GetExaWatcherResults.sh.

EXAMPLES OF MANUAL COLLECTIONS:


1. To collect from/to a certain date and time:

Results are written to location: “/opt/oracle.ExaWatcher/archive/ExtractedResults/”


2. To collect for a time range. In this case, we are collecting for 4 hrs before and after 1300:

The default archive directory is /opt/oracle.ExaWatcher/archive/ExtractedResults; However, you


can change this using [-d|--archivedir] flag:
Example of changed default archive location to /tmp/ExaWatcherArchive:

Results are now written to “/tmp/ExaWatcherArchive/ExtractedResults” instead of


“/opt/oracle.ExaWatcher/archive/ExtractedResults”.

Additional information about ExaWatcher may be found in the help menu:

150
OPTIMIZING DATABASE PERFORMANCE
Areas for special consideration:

 Flash memory usage


 Compression usage
 Index usage
 ASM allocation unit size
 Minimum extent size
 Exadata specific system statistics

FLASH MEMORY USAGE:

Each Exadata Storage Server X4-2 contains 3Tb of high performance flash memory. Primary uses include:

 Exadata Smart Flash Cache:


– Speeds up access to frequently accessed data
– Uses most of the available flash memory by default
– Can be managed automatically for maximum efficiency
– Users can provide optional hints to influence caching priorities.
– Administrators can disable Smart Flash Cache for specific databases.
– Can be configured in write-through mode or write-back mode
– Beneficial for OLTP and Data Warehouse workloads

 Exadata Smart Flash Log:


– Small (512 MB) high-performance temporary store for redo log records
– Managed automatically by Exadata Storage Server software
– Administrators can disable Smart Flash Log for specific databases.

INFLUENCING CACHING PRIORITIES


 Users can influence caching priorities using the CELL_FLASH_CACHE storage attribute:

– DEFAULT uses Smart Flash Cache normally.


– KEEP uses Smart Flash Cache more aggressively.
– DEFAULT objects cannot steal cache space from KEEP objects.
– KEEP objects can consume up to 80% of the cache.
– Unused KEEP object data is periodically flushed to disk.
– NONE specifies that Smart Flash Cache is not used.
EXAMPLES:

SQL> CREATE TABLE calldetail (…………….) STORAGE (CELL_FLASH_CAHCE KEEP);


SQL> ALTER TABLE calldetail STORAGE (CELL_FLASH_CAHCE NONE);

151
CHOOSING THE FLASH CACHE MODE:

When Smart Flash Cache operates in write-through mode, writes are directed to disk through 512 MB of
battery-backed cache on the Exadata Storage Server disk controller. Because of this, write I/O latency is
typically not a performance issue. As a result, write-through flash cache offers excellent performance for
most applications.

For extremely write-intensive applications, the write volume can flood the disk controller cache,
rendering it effectively useless. Write-back flash cache provides a solution for write intensive
applications. When Smart Flash Cache operates in write-back mode, writes go directly to flash and data
is only written to disk when it is aged out of the cache. As a result, the most commonly read and written
data is maintained in Smart Flash Cache while less accessed data is maintained on disk. Note that
applications that are not bottlenecked on writes will see little or no benefit from the extra write
throughput enabled by write-back Smart Flash Cache. Also, in write-back mode all mirror copies of data
are written to Smart Flash Cache, which effectively reduces the cache size when compared to write-
through mode.

Because of the different characteristics of each cache mode, it is recommended to use write-through
mode by default and only enable write-back mode when write I/Os are observed as a performance
bottleneck. The best way to determine a write bottleneck is to look for free buffer waits in the database
wait event statistics. Administrators can also check Exadata Storage Server metrics for high disk I/O
latencies and a large percentage of writes.

152
COMPRESSION USAGE:

In addition to the basic and OLTP compression modes provided by Oracle Database, Exadata Storage
Server provides Exadata Hybrid Columnar Compression. Exadata Hybrid Columnar Compression
technology uses a modified form of columnar storage instead of row-major storage. Sets of rows are
stored in an internal structure called a compression unit. Within each compression unit, the values for
each column are stored together along with metadata that maps the values to the rows. Compression is
achieved by replacing repeating values with smaller symbolic references. Because a compression unit is
much larger than an Oracle block, and because column organization brings similar values together,
Exadata Hybrid Columnar Compression can deliver much better compression ratios than both basic and
OLTP compression. The best rates of compression are achieved using direct path loads.
Exadata Hybrid Columnar Compression provides a choice of compression modes to achieve the proper
trade-off between disk usage and CPU overhead:

 Warehouse compression: This type of compression is optimized for query performance, and is
intended for DSS and data warehouse applications.
 Online archival compression: This type of compression is optimized for maximum compression
ratios, and is intended for historical data and data that do not change.

Exadata Hybrid Columnar Compression supports DML operations on compressed data.


However, updated rows and rows added using conventional path inserts are placed into single-block
compression units which yield a lower compression ratio than direct-path loads.
In addition, updates and deletes on tables using Exadata Hybrid Columnar Compression require the
entire compression unit to be locked, which may impact concurrency.

153
Finally, updates to rows using Exadata Hybrid Columnar Compression cause ROWIDs to change. As a
result, Exadata Hybrid Columnar Compression is recommended for situations where data changes are
infrequent or where data sets are reloaded rather than substantially changed.

In conclusion, Exadata Hybrid Columnar Compression makes effective use of Exadata Storage Server
hardware to deliver the highest levels of compression for data in an Oracle database. It is best suited to
cases where the data is not subject to substantial change. For transactional data sets, you should
consider OLTP compression instead of Exadata Hybrid Columnar Compression. In all cases, you should
be aware of the relative merits and overheads associated with each compression type in order to
choose the best approach for your situation.

INDEX USAGE:
Some queries that require indexes when using conventional storage will perform acceptably without
indexes using Exadata Database Machine. Review your queries that use indexes to determine if they
would run acceptably using Smart Scan.

To test if queries run acceptably without an index, you can make the index invisible to the optimizer. An
invisible index still exists and is maintained by DML operations, but it is not used by the optimizer for
queries. To make an index invisible, use the following command:

ALTER INDEX <index_name> INVISIBLE

Removing unnecessary indexes aids the performance of DML operations by removing index related I/Os
and maintenance operations such as index rebalancing. Removing unnecessary indexes also saves
storage space.

ASM ALLOCATION UNIT SIZE:

 By default, ASM uses an allocation unit (AU) size of 1 MB.


 For Exadata storage, the recommended AU size is 4 MB.
 AU size must be set when a disk group is created.
 AU size cannot be altered after a disk group is created.
 AU size is set using the AU_SIZE disk group attribute.
Example:

SQL> CREATE DISKGROUP data NORMAL REDUNDANCY DISK 'o/*/data_CD*' ATTRIBUTE


'compatible.rdbms' = '11.2.0.0.0', 'compatible.asm' = '11.2.0.0.0', 'cell.smart_scan_capable' = 'TRUE',
'au_size' = '4M';

MINIMUM EXTENT SIZE:


Extent size is managed automatically in locally managed tablespaces. This option automatically increases
the size of the extent depending on segment size, available free space in the tablespace, and other
factors. By default, the extent size starts at 64 KB and increases in increments of 1 MB, 8 MB, or 64 MB.

154
Generally speaking, it is recommended that large segments should be defined with larger than default
initial extents to minimize the needless proliferation of small extents in the database. For Database
Machine, the recommendation is to have initial extents of 8 MB for large segments. This setting
complements the recommendation to set the ASM allocation unit size to 4 MB. The following options
can be used to influence the database to allocate large extents:

 The INITIAL storage parameter will set the size of the first extent for a newly created object. It can
be set specifically for individual objects or it can be set at the tablespace level for locally managed
tablespaces. For large objects, set INITIAL to 8 MB.
 Prior to Oracle Database release 11.2.0.2, the CELL_PARTITION_LARGE_EXTENTS initialization
parameter can be used to set the INITIAL storage parameter to 8 MB for all new segments in a
partitioned object. The default and recommended setting is TRUE.Commencing with Oracle
Database release 11.2.0.2, the parameter is hidden and internalized. The internal setting is TRUE
and Oracle does not recommend changing it.

Examples:

SQL> CREATE TABLE t1(col1 NUMBER(6), col2 VARCHAR2(10)) STORAGE (INITIAL 8M MAXSIZE 1G );
SQL> CREATE BIGFILE TABLESPACE ts1 DATAFILE '+DATA' SIZE 100G DEFAULT STORAGE (INITIAL 8M
NEXT 8M );

EXADATA SPECIFIC SYSTEM STATISTICS:


New Exadata specific system statistics are introduced commencing with Oracle Database version
11.2.0.2 bundle patch 18 and 11.2.0.3 bundle patch 8.

Gathering Exadata specific system statistics ensures that the optimizer is aware of the specific
performance characteristics associated with Exadata Database Machine, which allows it to choose
optimal execution plans.

In addition to setting various system statistics relating to system CPU speed and I/O performance,
gathering system statistics in Exadata mode sets the multi block read count (MBRC) statistic correctly for
Exadata. Prior to this enhancement, the optimizer assumed an MBRC value of 8 blocks, or 64 KB
assuming an 8 KB database block size. For Exadata, the true I/O size used during scans is based on the
ASM allocation unit (AU) size, which is at least 1 MB. This difference causes the optimizer to incorrectly
judge the cost of scan operations on Exadata, which can lead to an alternative execution plan being
chosen when in fact a scan would be the optimal approach.
It is recommended to gather Exadata specific system statistics for all new databases on Exadata
Database Machine. For existing databases, administrators are advised to consider current application
performance before making any changes. If the application performance is stable and good then no
change is required. If there is evidence to suggest that current application performance is impacted by
suboptimal plans, then thoroughly test your system with the updated system statistics before finalizing
the change.

SQL> exec dbms_stats.gather_system_stats('EXADATA');

155
BEST PRACTICES
ASM INSTANCE INITIALIZATION PARAMETERS
Experience and testing has shown that certain ASM initialization parameters should be set at specific
values. These are the best practice values set at deployment time. By setting these ASM initialization
parameters as recommended, known problems may be avoided and performance maximized. The
parameters are specific to the ASM instances. The impact of setting these parameters is minimal.

If the ASM initialization parameters are not set as recommended, a variety of issues may be
encountered, depending upon which initialization parameter is not set as recommended, and the actual
set value.

cluster_interconnects:
Colon delimited Bondib* IP addresses

SQL> show parameter cluster_interconnects


NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
Cluster_interconnects string 192.xx.xx.1:192.xx.xx.2

asm_power_limit:

Exadata default asm_power_limit =4 to mitigate application performance impact during ASM rebalance
and resync. Evaluate application performance impact before using a higher ASM_POWER_LIMIT. The
highest power that should be used with Exadata is 64.

Changing the default asm_power_limit is not the same as changing the power for an actively running
rebalance. Setting asm_power_limit=0 disables ASM rebalance operations. In normal operation,
asm_power_limit should not be set to 0.

Processes=1024
sga_target=2048M
pga_aggregate_target=400M
memory_target=0
memory_max_target=0
The proper way to implement the memory related parameters is as follows. This is important as it works
around an issue where memory_target remains set despite setting it to 0.

alter system set sga_target=2048M sid='*' scope=spfile;


alter system set pga_aggregate_target=400M sid='*' scope=spfile;
alter system set memory_target=0 sid='*' scope=spfile;
alter system set memory_max_target=0 sid='*' scope=spfile;
alter system reset memory_max_target sid='*' scope=spfile;

156
ASM DISKGROUP ATTRIBUTES:

appliance.mode:
TRUE for diskgroups with compatible.asm=12.1.0.2 or higher. Enables fixed partnering which ensures
consistent ASM rebalance timings. If appliance.mode is changed, a rebalance is required for it to take
effect.
Note: Fixed partnering is not used with version 12.1.0.1.

compatible.asm=Grid Infrastructure software version in use


compatible.rdbms= the lowest RDBMS software version in use by databases that will access Diskgroup
cell.smart_scan_capable=TRUE
au_size=4M

disk_repair_time:
Disk_repair_time indicates how long a disk will remain offline in ASM before being dropped. The primary
use for this attribute on Exadata is for when a disk has been inadvertently pulled out of its slot. 3.6h is
the default setting so we are just making sure it hasn't been changed.

failgroup_repair_time:
Failgroup_repair_time indicates how long a cell will remain offline in ASM before being dropped. The
primary use for this attribute on Exadata is for when a cell goes offline during unplanned outages or
planned maintenance. 24.0h is the default setting so we are just making sure it hasn't been changed

DATABASE INITIALIZATION PARAMETERS:

cluster_interconnects:
Colon delimited Bondib* IP addresses
alter system set cluster_interconnects = '192.xx.xx.1:192.xx.xx.2' scope=spfile sid='instance1' ;
alter system set cluster_interconnects = '192.xx.xx.3:192.xx.xx.4' scope=spfile sid='instance2' ;

compatible:
Compatible should be set to the current RDBMS software version in use out to the fourth digit (ex:
11.2.0.4) to enable all available features in that release.
log_buffer=128M
Ensures adequate buffer space for new LGWR transport

db_files=1024:
The default value (200) is typically too small for most customers.

USE_LARGE_PAGES = ONLY
This ensures the entire SGA is stored in hugepage for Linux based systems only.
Benefits: Memory savings and reduce paging and swapping
Prerequisites: Operating system hugepages setting need to be correctly configured and need to be
adjusted whenever another database instance is added or dropped or whenever the sga sizes change.

157
sga_target> 0
Setting the parameter sga_target to a value greater than zero to enable Automatic Shared Memory
Management (ASMM) is the recommended best practice. ASMM enables the use of hugepages, also a
best practice, whereas Automatic Memory Management (AMM) does not.

DB_BLOCK_CHECKING
Initially db_block_checking is set to off due to potential performance impact. Block checking typically
causes 1% to 10% overhead, but for update and insert intensive applications (such as Redo Apply at a
standby database) the overhead can be much higher. OLTP compressed tables also require additional
checks that can result in higher overhead depending on the frequency of updates to those tables.
Workload specific testing is required to assess whether the performance overhead is acceptable.

If performance concerns prevent setting DB_BLOCK_CHECKING to either FULL or MEDIUM at a primary


database, then it becomes even more important to enable this at the standby database. This protects
the standby database from logical corruption that would be undetected at the primary database.

alter system set db_block_checking=MEDIUM scope=both sid=’*’;

DB_BLOCK_CHECKSUM
alter system set db_block_checksum=TYPICAL scope=both sid=’*’;

DB_LOST_WRITE_PROTECT
alter system set db_lost_write_protect=TYPICAL scope=both sid=’*’;

db_create_file_dest=DATA diskgroup
"DATA" diskgroup is recommended, but may be set to any valid diskgroup that is not the diskgroup
specified by db_recovery_file_dest.

db_recovery_file_dest=RECO diskgroup
"RECO" diskgroup is recommended, but may be set to any valid diskgroup that is not the diskgroup
specified by db_create_file_dest.
db_recovery_file_dest_size= RECO diskgroup size
Check to ensure the size is <= 90% of the RECO diskgroup size

OLTP INSTANCE DATABASE INITIALIZATION PARAMETERS

parallel_max_servers=240
Check to ensure not more than the recommended value. Setting this higher than this recommended
value can deplete memory and impact performance.

sga_target=24G
Check to ensure not higher than the recommended value.

158
Enabling Automatic Shared Memory Management by setting SGA_TARGET and disabling Automatic
Memory Management by setting MEMORY_TARGET=0 is generic database practice for any database
server with 4 GB or more physical memory.

pga_aggregate_target=1/12th of Total Memory

Check to ensure not higher than the recommended value.

NEW FEATURES INTRODUCED IN 11.2.3.3.0


AUTOMATIC HARD DISK SCRUBBING AND REPAIR:

When data is read, checks are performed to validate its logical consistency. If a corruption is detected on
NORMAL or HIGH redundancy disk groups, ASM can automatically recover by reading the mirrored data
copies. One weakness with this approach is that corruption of seldom-accessed data could go unnoticed
in the system for a long time between reads. Also, because reads are normally always directed to the
primary data copy, corruption of secondary copies can go unnoticed while the primary copy is available.
This can leave the system vulnerable to data loss if the primary copy also fails.

Exadata release 11.2.3.3.0, introduces a proactive hard disk scrubbing capability that minimizes the
possibility of data loss due to latent corruptions on storage server hard disk drives.

Scrubbing works by periodically scanning each disk. Every two weeks, by default. So that scrubbing does
not interfere with system performance, the scrubbing read I/Os are issued when the disk is idle. If a
corruption is detected, the following occurs:

 If the data is dirty in write-back Smart Flash Cache, no further action is taken because the corruption
will be overwritten when the cached copy is flushed.

 If the data is clean in Exadata Smart Flash Cache, the corruption is repaired by using the cached
data.

 If a clean data copy is not in Exadata Smart Flash Cache, the corruption is repaired by using ASM to
read an uncorrupted copy of the data.

Configuration options:

Setting the next start time:


CellCLI> ALTER CELL hardDiskScrubStartTime="<Timestamp>"

Setting the scrubbing interval:


CellCLI> ALTER CELL hardDiskScrubInterval = [ daily | weekly |biweekly | none ]

159
FAST FILE INITIALIZATION:

Without Exadata, creating a database file requires the database to format and write the entire file.
However, in a newly created file, each block contains only a few dozen bytes of metadata.

The Exadata fast file creation feature allows the database to transmit the metadata to Exadata Storage
Server. CELLSRV then creates and writes the blocks in the file based on the metadata. For file creation
operations, Exadata fast file creation almost eliminates resource consumption on the database server
and I/O bandwidth consumption in the storage network;
However, the Exadata Storage Server must still format and write the entire file.

With Exadata release 11.2.3.3.0, a further optimization is introduced. CELLSRV no longer writes the file
to disk immediately. Instead, CELLSRV persists only the metadata. Later, CELLSRV uses the metadata to
construct empty blocks on-the-fly when they are first referenced by the database. Using this
optimization, the I/Os required to write empty formatted blocks are avoided and file creation on
Exadata is much faster, especially for very large files. For example, creating a 1 TB data file on a full rack
Database Machine with release 11.2.3.3.0 takes approximately 90 seconds using fast file initialization.
Creating the same file on earlier releases takes approximately 220 seconds.

Requirements:
 Exadata Smart Flash Cache must be in write-back mode:
– Cell attribute flashCacheMode=WriteBack
 Smart Scan must be enabled:
– Database parameter CELL_OFFLOAD_PROCESSING=TRUE
 The ASM AU size must be 4MB or larger:
– ASM disk group attribute au_size=4M
 The file size must be 4MB or larger

NEW FEATURES INTRODUCED IN 12.1.1.1.0


SUPPORT FOR MIXED DATABASE VERSIONS:

160
Exadata Storage Server version 12.1.1.1.0 contains separate offload servers for each major database
release so that it can fully support all offload operations. Offload requests coming from Oracle Database
11g Release 2 (11.2) are handled automatically by a release 11g offload server, and offload requests
coming from Oracle Database 12c Release 1 (12.1) database are handled automatically by a 12c offload
server. The offload servers are automatically started and stopped in conjunction with CELLSRV, so there
is no additional configuration or maintenance procedures associated with them.
CELL TO CELL DATA TRANSFER:

In earlier releases, Exadata Cells did not directly communicate to each other. Any data movement
between the cells was done through the database servers. Data was read from the source cell into
database server memory, and then written out to the destination cell.

Starting with Exadata Storage Server version 12.1.1.1.0, database server processes can offload data
transfer. That is, a database server can instruct the destination cell to read the data directly from the
source cell, and write the data to its local storage. This reduces the amount of data transferred across
the fabric by half, reducing InfiniBand bandwidth consumption, and memory requirements on the
database server.

Oracle Automatic Storage Management (Oracle ASM) resynchronization, resilver, and rebalance
operations use this feature to offload data movement. This provides improved bandwidth utilization at
the InfiniBand fabric level and improved memory utilization in the Oracle ASM instances.

The minimum software requirement for this feature is Oracle Database 12c Release 1 (12.1) or later, and
Exadata Storage Server version 12.1.1.1.0 or later. No additional configuration is needed to use this
feature.

161
DBMCLI utility
What is DBMCLI in Exadata?

Starting Exadata Storage Server Release 12.1.2.1.0 a new command-line interface called DBMCLI is
introduced for database server.

 DBMCLI is the database machine command line interface to administrate Exadata database server.
 It runs on each database machine. By default it is pre-installed on each database server and on
virtualized machine while it shipped.
 DBMCLI utility is included in latest release 12.1.2.1.0 of Exadata image

Using DBMCLI:

 Configure ASR, capacity-on-demand, infrastructure as a service


 Configure database server e-mail alerts
 Configure, manage and monitor database servers
 Start stop server
 Manage server configuration information
 Enable or disable the server
 It can be start through SSH by executing DBMCLI command
 Used to monitor database server metrics
 It uses DB server OS authentication, it doesn't required login parameter, moreover like CELLCLI

SYNTAX:

$dbmcli [port_number] [-n] [-m] [-xml] [-v | -vv | -vvv] [-x] [-e command]

Port_number - Specifies HTTP port number of database server. If not specified it will use the port from
cellinit.ora file.If it is not in cellinit.ora than port number 8888 will be used.
n - Execute DBMCLI command in non-interactive mode
m - Run DBMLI in run monitor mode (Read Only)
xml - Display output in xml format, can be used for OEM
v, vv, vvv - Set log level, Fine, Finer and Finest level of log
x - Suppress the banner
e - To execute specific DBMCLI command

USERS:

DBMCLI has two users

dbmadmin - Used for administration purpose


dbmmonitor - Used for monitoring purpose

162
WHAT IS HARD IN ORACLE EXADATA?
HARDWARE ASSISTED RESILIENT DATA (HARD):
HARD prevents data corruptions from being written to storage disks. Oracle database validates and add
protection while sending data to the storage server. Whenever data will go to storage server, first it will
check if there is any data corruption. If it found the corrupted data, it would stop corrupted data from
being written to disk. Previously or in non-Exadata hardware it's not possible to prevent corrupted data
to be written on to the storage and that was major reason for database corruption in many cases which
Oracle has eliminated with the use of storage software which has been introduced in Exadata storage
cell.
There is nothing to set on Database level or storage level as HARD handle corruption transparently
which also included for ASM rebalance.

EXADATA DICTIONARY VIEWS


Listed down some Exadata views which can help you to get Exadata related statistics

Get Cell Definition:


V$CELL
GV$CELL
Exadata cell effectiveness and statistics:
V$SYSSTAT
V$MYSTAT
V$SEGMENT_STATISTICS
V$SQL

Cell Performance Statistics:


V$CELL_STATE

Display Exadata Cell Threads:


V$CELL_THREAD_HISTORY

Backup Related View:


V$BACKUP_DATAFILE

163
EXADATA PATCHING
There are three categories of software that must be maintained in Database Machine:

 Software and firmware on the Exadata Storage Servers


 Software and firmware on the database servers
 Software and firmware for other components
 Compatibility between the different software needs to be maintained.
 Patches and updates are rolling in nature wherever possible.
 Key information is maintained in My Oracle Support note 888828.1.

PATCHING UTILITIES AVAILABLE:


What is to Patch with which utility?

 Compute node has two parts :


 RDBMS and GI binaries are patched using the opatch utility.
 OS and firmware of different components are upgraded using "dbnodeupdate.sh” script.
 Oracle has recently come up with using patchmgr for compute nodes as well, but still
dbnodeupdate.sh is in use predominantly.
 Storage nodes and IB switches are patched using the “patchmgr” utility.

PATCH APPLICATION FOR EXADATA CELLS:


Software updates are applied to Exadata Storage Servers using the patchmgr utility. The patchmgr utility
installs and updates all software and firmware needed on Exadata Storage Servers. The patchmgr utility
is included with the software release that is downloaded from My Oracle Support.

The patchmgr utility performs the software update on all Exadata Cells in the configuration. The utility
supports rolling and non-rolling software update methods. The utility can send e-mail messages about
patch completion, and the status of rolling and non-rolling patch application.

164
The following are the commands used:
PRECHECK:
./patchmgr -cells cell_group -patch_check_prereq [-rolling] [-ignore_alerts] \
[-smtp_from "addr" -smtp_to "addr1 addr2 addr3 ..."]
PATCH:
./patchmgr -cells cell_group -patch [-rolling] [-ignore_alerts] [-unkey] \
[-smtp_from "addr" -smtp_to "addr1 addr2 addr3 ..."]

The patchmgr utility must be launched as the root user from a database server in the rack that
has root user SSH equivalence set up to the root user on all cells to be patched.

UNDERSTANDING ROLLING UPDATES:

The rolling update method is also known as no deployment-wide downtime.

 Benefits of this method:


 Does not require any database downtime.
 If there is a problem, then only one cell is affected.

 Considerations for this method:


 Takes much longer than the non-rolling update. Cells are processed one at a time, one after the
other. The minimum time it takes to apply the patch is approximately the number of cells
multiplied by the time it takes to patch one cell in non-rolling updates. A single cell update in the
non-rolling case takes approximately 1 hour.

 The time can be significantly more when done as a rolling update, and there is a significant load
is running on the deployment. Such load conditions add to the time spent in activating and re-
activating the grid disks on the cells as they are patched.

 Oracle ASM repair timeout needs to be increased during the patching to avoid Oracle ASM
dropping the grid disks on the cell. Re-adding these grid disks is a time consuming manual
operation. Ensure that disk_repair_time on the disk groups is set to a minimum of the default
value of 3.6 hours.

 The following is a high-level description about applying this patch to Exadata Cells using a rolling
update. These steps are performed automatically by the patchmgr utility.
1. Inactivate all the grid disks on one cell that are eligible to be inactivated. The eligible grid disks
have their attribute asmdeactivationoutcome set to Yes.
2. Confirm that the inactivated disks are OFFLINE.
3. Patch the cell.
4. After the cell reboots and comes up correctly, activate the grid disks that were inactivated.
5. Confirm that all the grid disks activated in step 4 are ONLINE.
6. Move to the next cell to be patched and repeat the steps.

165
The patchmgr utility performs the preceding steps one cell at a time.

UNDERSTANDING NON-ROLLING UPDATES

The non-rolling update method is also known as deployment-wide downtime. The patch is applied to all
cells in parallel when you choose to have deployment-wide downtime.

 Benefits of this method:


 All cells are done in parallel. Therefore, the patch time is almost constant no matter how many
cells there are to patch.
 There is no time spent trying to inactivate and re-activate grid disks.
 Considerations for this method:
 No cells are available during the patching process. All databases using the cell services must
remain shut down for the duration of the patch.
 When a cell encounters problems during patching, the entire deployment is unavailable until
resolution of the problem on that one cell.
Sample imageinfo before patching:

Active image version: 11.2.3.2.1.101105


Active image activated: 2012-11-06 21:52:08 -0700
Active image status: success
Active system partition on device: /dev/md5 << This is root(/) FS having the OS
Active software partition on device: /dev/md7 << This is /opt/oracle having exadata s/w

In partition rollback: Impossible

Cell boot usb partition: /dev/sdm1


Cell boot usb version: 11.2.3.2.1.101105

Inactive image version: 11.2.2.4.2


Inactive image activated: 2010-08-28 20:01:30 -0700
Inactive image status: success
Inactive system partition on device: /dev/md6
Inactive software partition on device: /dev/md8
Rollback to the inactive partitions: Possible

Sample imageinfo after patching:

Active image version: 12.1.2.1.2.150617.1


Active image activated: 2015-06-17 16:54:43 -0700
Active image status: success
Active system partition on device: /dev/md6
Active software partition on device: /dev/md8 << Inactive partitions became active after patching
In partition rollback: Impossible

Cell boot usb partition: /dev/sdm1


Cell boot usb version: 12.1.2.1.2.150617.1

Inactive image version: 11.2.3.2.1.101105


Inactive image activated: 2012-11-06 21:52:08 -0700
Inactive image status: success
Inactive system partition on device: /dev/md5
Inactive software partition on device: /dev/md7
Rollback to the inactive partitions: Possible

166
As you can see from the above output, patchmgr has installed the new version into the inactive
partitions (/dev/md6,/dev/md8) and then switched the active and inactive partitions. Also, note that the
software is upgraded on the USB as well.
ROLLING BACK SUCCESSFULLY PATCHED CELLS:
You can only roll back successfully-patched Exadata Cells. Cells with incomplete or failed patching
cannot be rolled back.Perform the rollback using the following command:

./patchmgr -cells cell_group -rollback [-rolling] [-ignore_alerts] \


[-smtp_from "from_email_address"] [-smtp_to "to_email_address1 ….”]

PATCH APPLICATION FOR EXADATA COMPUTE NODES:


The script “dbnodeupdate.sh” is used for updating Compute Nodes.
The utility performs the following tasks:

 Uses the built-in dbserver_backup.sh script to perform a backup of the file system hosting the
operating system before updating the server when the database servers use a Logical Volume
Manager (LVM) configuration.
 Automates all preparation, update, and validation steps including the stopping of the databases and
Grid Infrastructure stack, Oracle Enterprise Manager Cloud Control agents, and unmourning remote
network mounts that might exist.
 Applies Oracle best practices and workarounds for the latest known issues.
 Verifies that the update was successful, relinks the Oracle binaries, and starts the Oracle stack.

The dbnodeupdate.sh utility is available from My Oracle Support note 1553103.1. The utility supports all
hardware generations, Oracle Exadata database servers running Oracle Virtual Server (dom0) and
Exadata Virtual Machines (domU). The dbnodeupdate.sh utility also supports updates from Oracle Linux
5 to Oracle Linux 6.

TWO DIFFERENT APPROACHES OF dbnodeupdate.sh:

1. Using Separate server to hold the repository


 The first option is to build a local ULN mirror on a separate Linux server. The mirror is a
separate Linux server that holds the downloaded channel data with the Oracle Exadata release
from ULN that you want to update to. The database servers connect to the local mirror
(repository) to retrieve updates.
 Do not use a database server as the local ULN mirror.
 This method is preferable when there are large no.of database servers to update and the DB
servers have customized software that requires updates from other ULN channels.
2. Using Separate server to hold the repository
 The second option is to download the ISO of Oracle Exadata channel from MOS in compressed
file format, copy it to every database server and specify the location of the compressed file to
the dbnodeupdate.sh utility.
 This method is useful when there is no separate server available to be used as repository server
and the DB servers do not have customized software that requires updates from additional ULN
channels.

167
FOR PRE-REQ CHECKS ONLY:
Example using iso : ./dbnodeupdate.sh -u -l /u01/p16432033_112321_Linux-x86-64.zip –v

FOR PATCHING USING HTTP AND ISO:

./dbnodeupdate.sh -u -l http://myrepo/yum/EXADATA/dbserver/12.1.2.1.0/base/x86_64/
./dbnodeupdate.sh -u -l p20170913_121210_Linux-x86-64.zip

After booting the node, it's recommended to run dbnodeupdate.sh again with the "-c" flag to relink the
oracle home's again.

./dbnodeupdate.sh –c
dbserver_backup.sh script :

For backing-up Exadata dbserver OS updates, the dbserver_backup.sh script is used by


dbnodeupdate.sh. For each upgrade by default the dbserver_backup.sh script is executed. When
executed (either manually or via dbnodeupdate), the dbserver_backup.sh script creates a small
snapshot of the 'active' sys lvm. The active sys lvm is the primary lvm that your current OS image is
running on. For example:
[root@mynode~]#imageinfo

Kernel version: 2.6.39-400.126.1.el5uek #1 SMP Fri Sep 20 10:54:38 PDT 2013 x86_64
Image version: 11.2.3.3.0.131014.1
Image activated: 2014-01-13 13:20:52 -0700
Image status: success
System partition on device: /dev/mapper/VGExaDb-LVDbSys1

In the above example the active lvm is /dev/mapper/VGExaDb-LVDbSys1.The snapshot is created to


have a 'consistent' 'view' of the root filesystem while the backup is made. After the snapshot is created,
it's mounted by the same script and then it's contents are copied over to the inactive lvm. For lvm
enabled systems, there are always 2 'sys' lvm's "VGExaDb-LVDbSys1" and "VGExaDb-LVDbSys2".
VGExaDb-LVDbSys2 will automatically be created (on lvm enabled system) if not existing yet. For the
example above, the 'inactive' lvm will be VGExaDb-LVDbSys2.

One of the first steps the dbnodeupdate.sh script will be doing when executed is making a backup with
this script. Now, if you want to shorten your downtime and make this backup before the start of your
'planned maintenance window' you have 2 options: Either execute the dbserver_backup.sh script
yourself or use dbnodeupdate.sh with the "-b" flag to make a backup only before-hand.

For backup only:


Example: ./dbnodeupdate.sh -u -l /u01/p16432033_112321_Linux-x86-64.zip -b

168
When you then have the downtime for planned maintenance and already have the backup you can then
let dbnodeupdate skip the backup using the "-n" flag.
Now, when the update proceeds, the current active lvm will remain the active lvm. This is different than
what happens on the cells where the active partition becomes inactive with an update.

ROLLING BACK OF COMPUTE NODE IMAGES:

Typically you will only switch active sys lvm's when a rollback needs to be done on a db server, for
example, an upgrade from 11.2.3.3.0 to 12.1.1.1.0 needs to be rolled-back. What happens then is
nothing more than 'switching' the filesystem label of the sys lvm's, updating grub (the bootloader) and
restoring the /boot directory (backed up earlier also by dbnodeupdate.sh). Then, a next boot will now
have the previous inactive lvm as active lvm.The command to rollback is:
./dbnodeupdate.sh –r
Execute the “dbnodeupdate.sh –c” after completion of the rollback.
ORCHESTRATING DATABASE SERVERS UPDATES (DBNODEUPDATE) WITH PATCHMGR:
Starting with Exadata release 12.1.2.2.0, Oracle Exadata database servers (releases later than11.2.2.4.2),
Oracle Exadata Virtual Server nodes (dom0), and Oracle Exadata Virtual Machines (domU) can be
updated, rolled back, and backed up using patchmgr. You can still run dbnodeupdate.sh in standalone
mode, but using patchmgr enables you to run a single command to update multiple nodes; you do not
need to run dbnodeupdate.sh separately on each node.
Patchmgr can update the nodes in a rolling or non-rolling fashion. Similar to updating cells, you create a
file that specifies a list of database servers (including domU and dom0) to be updated. Patchmgr will
then update the specified nodes in a rolling or non-rolling fashion. The default is non-rolling, which is
similar to cell patching. Rollbacks are done in a rolling fashion.
For patchmgr to do this orchestration, you run it from a database node that will not be updated itself.
That node needs to be updated later by running patchmgr from a node that has already been updated
or by running dbnodeupdate.sh standalone on that node itself.

The -dbnode_precheck option simulates the update without actually doing the update.
This is to catch and resolve any errors in advance of the actual update.

USAGE:
# ./patchmgr -dbnode dbnode_list -dbnode_precheck -dbnode_loc patch_file_name
-dbnode_version version

The -dbnode_upgrade option performs the actual upgrade of the database node(s). Note that if you
need to update all database nodes in a system, you will have to run the command from a driving node,
which cannot be one of the database nodes being updated.

USAGE:
# ./patchmgr -dbnode dbnode_list -dbnode_upgrade -dbnode_loc patch_file_name
-dbnode_version version
Example of a rolling update using yum http repository:

# ./patchmgr -dbnode dbnode_list -dbnode_upgrade -dbnode_loc

169
http://yum-repo/yum/ol6/EXADATA/dbserver/12.1.2.1.0/base/x86_64/ -dbnode_version
12.1.2.2.0.date_stamp -rolling

Example of a non-rolling update using zipped yum ISO repository:

# ./patchmgr -dbnode dbnode_list -dbnode_upgrade -dbnode_loc ./repo.zip -dbnode_


version 12.1.2.2.0.date_stamp

The -dbnode_rollback option switches active and inactive LVM, restores /bootcontents, reinstalls Grub,
and reboots the database nodes so that it reverses the effects of the upgrade. Note that this is only
available on LVM-enabled systems.

# ./patchmgr -dbnode dbnode_list -dbnode_rollback

PATCH APPLICATION FOR INFINIFBAND SWITCHES:

The patchmgr utility is used to upgrade and downgrade the InfiniBand switches. The minimum switch
firmware release that can use the patchmgr utility is release 1.3.3-2. Switch firmware is upgraded in a
rolling manner. If a spine switch is present in therack, then the spine switch is upgraded first. If a spine
switch is not in the rack, then upgrade the switch that is running the subnet manager. If the subnet
manager is not running on the switches, then perform the upgrade in any order.

1. Log in as the root user to a database server on Oracle Exadata Database Machine that has root user
SSH access to the switches. The database server must be on the same InfiniBand network as the
switches.

2. Download the appropriate patch file to the database server. Refer to My OracleSupport note
888828.1 for patch information.

3. Uncompress the patch files. The files are uncompressed to the patch_release.date directory.

4. Create a file listing the InfiniBand switches that need to be updated, with one switch per line. The
following is an example of the file:
# cat ibswitches.lst
myibswitch-01
myibswitch-02

5. If the current switch release is 1.3.3-2, then set the environment variable
EXADATA_IMAGE_IBSWITCH_ROLLBACK_VERSION to 1.3.3-2 using the following command:
# export EXADATA_IMAGE_IBSWITCH_ROLLBACK_VERSION=1.3.3-2

6. Change to the patch_release.date directory.

7. Run the prerequisite checks using the following command:


# ./patchmgr -ibswitches ibswitches.lst -upgrade -ibswitch_precheck [-force][-unkey]

170
8. Upgrade the switches using the following command:
# ./patchmgr -ibswitches ibswitches.lst -upgrade [-force] [-unkey]

DOWNGRADING THE SWITCH SOFTWARE

The only included downgrade is to release 2.1.5-1. Use the following commands to downgrade the
firmware:

# ./patchmgr -ibswitches ibswitches.lst -downgrade -ibswitch_precheck [-force] [-unkey]


# ./patchmgr -ibswitches ibswitches.lst -downgrade [-force] [-unkey]

What is QFSDP:

As a convenience for downloading Exadata software updates, Oracle is providing the Quarterly Full Stack
Download Patch (QFSDP). The QFSDP contains the complete collection of current software patches, and
will be released quarterly, aligned with the database Critical Patch Update (CPU) and Patch Set Update
(PSU) releases.
Patches supplied in QFSDP are installed by following the component-specific README file, and will
remain available for individual download.

QFSDP releases contain the latest software for the following components:

 Infrastructure
o Exadata Storage Server
o InfiniBand Switch
o Power Distribution Unit
 Database
o Oracle Database and Grid Infrastructure DBBP, PSU, or QDPE
o Oracle JavaVM PSU (as of Oct 2014)
o OPatch
o OPlan
 Systems Management
o EM Agent
o EM OMS
o EM Plug-ins

Each component contains the following list of patches:

1. Infrastructure:
a. Exadata Storage Server contains updates to firmware, operating system, patchmgr plugins and
latest Exadata Storage Server software.

--12.1.1.1.2/
|-ExadataDatabaseServer/

171
|-p20849523_121112_Linux-x86-64.zip
|-README.html
|-ExadataStorageServer_InfiniBandSwitch/
|-p20699031_121112_Linux-x86-64.zip
|-README.html

b. ExadataDBNodeUpdate - contains updates to 'DB Node Update Utility' (5.150701)

c. SunRackIIPDUMeteringUnitFirmware -updates the Sun Rack II PDU Metering Unit Firmware(1.06)

Note: The Sun Rack II PDU Metering Unit Firmware (1.06) version hasn't changed since Jan2014, hence
it's not necessary to update the Sun Rack II PDU Metering Unit Firmware version if it's already at the
current level (1.06)

2. Database :

Contains the latest Bundle patches/PSU’s for GI/RDBMS versions like 12.1.0.2,12.1.0.1, 11.2.0.4 &
11.2.0.3.Also contains the latest OPatch and OPlan.

3. Systems Management :

Note : Our recommendation is to upgrade to the latest EM framework (12.1.0.4) with the latest released
Exadata and DB plugins (12.1.0.6 or later). Also, refer to My Oracle Support Note 1323298.1 to get to the
latest required patches on top of the latest releases.

EXADATA WAIT EVENTS

Understanding Oracle Exadata Storage Server Software Wait Events

Oracle uses a different set of wait events for disk I/O to Oracle Exadata Storage Server than disk I/O to
conventional storage because the wait events that are designed for Oracle Exadata Storage Server
directly show the cell and grid disk being accessed. This information is more useful for performance and
diagnostics purposes than the database file# and block# information provided by wait events for
conventional storage. Information about wait events is displayed in V$ dynamic performance views.

Note: The V$SESSION_ACTIVE_HISTORY view can be used to analyze wait events. This view shows
what has happened, when a query was run, and how it ran. It also shows what events the query had to
wait on.

This section contains:


 Monitoring Wait Events for Oracle Exadata Storage Server Software
 Using V$SESSION_WAIT to Monitor Sessions
 Using V$SYSTEM_EVENT to Monitor Wait Events
 Using V$SESSION_EVENT to Monitor Events by Sessions

172
Below table lists the Exadata specific wait events useful for monitoring a cell.

Wait Event Description

cell interconnect This wait event appears during retransmission for an I/O of a single-block or multiblock
retransmit during read. The cell hash number in the P1 column in the V$SESSION_WAIT view is the same
physical read cell identified for cell single block physical read and cell multiblock physical read.
The P2 column contains the subnet number to the cell, and the P3 column contains the
number of bytes processed during the I/O read operation.
cell list of blocks This wait event is equivalent to db file parallel read for a cell. The P1, P2, and P3 columns
physical read in V$SESSION_WAIT view for this event identify the cell hash number, disk hash number,
and the number of blocks processed during the I/O read operation.
cell multiblock This wait event is equivalent to db file scattered read for a cell. The P1, P2,
physical read and P3 columns in the V$SESSION_WAIT view for this event identify the cell hash
number, disk hash number, and the total number of bytes processed during the I/O read
operation.
cell single block This wait event is equivalent to db file sequential read for a cell. The P1, P2,
physical read and P3 columns in the V$SESSION_WAIT view for this event identify the cell hash
number, disk hash number, and the number of bytes processed during the I/O read
operation.
cell smart file This wait event appears when the database is waiting for the completion of a file
creation creation on a cell. The cell hash number in the P1 column in the V$SESSION_WAIT view
for this event should help identify a slow cell compared to the rest of the cells.
cell smart This wait event appears when the database is waiting for the completion of an
incremental backup incremental backup on a cell. The cell hash number in the P1column in
the V$SESSION_WAIT view for this event should help identify a slow cell when compared
to the rest of the cells.
cell smart index This wait event appears when the database is waiting for index or index-organized table
scan (IOT) fast full scans. The cell hash number in the P1column in the V$SESSION_WAIT view
for this event should help identify a slow cell when compared to the rest of the cells.
cell smart restore This wait event appears when the database is waiting for the completion of a file
from backup initialization for restore from backup on a cell. The cell hash number in the P1 column in
the V$SESSION_WAIT view for this event should help identify a slow cell when compared
to the rest of the cells.
cell smart table scan This wait event appears when the database is waiting for table scans to complete on a
cell. The cell hash number in the P1 column in the V$SESSION_WAIT view for this event
should help identify a slow cell when compared to the rest of the cells.
cell statistics gather This wait event appears when a select is done on the V$CELL_STATE,
V$CELL_THREAD_HISTORY or V$CELL_REQUEST_TOTALS tables. During the select, data
from the cells and any wait events are shown in this wait event.

173
If a cell hash number or disk hash number is associated with these wait events, then the value can be
joined with the CELL_HASHVAL column of V$CELL and the HASH_VALUE column of V$ASM_DISK to help
identify slow cells or disks.

Using V$SESSION_WAIT to Monitor Sessions


The V$SESSION_WAIT view displays the current or last wait for each session.
Following Example shows how to query the V$SESSION_WAIT view.

The second SELECT query displays the cell path and disk name.

Example: Using the V$SESSION_WAIT View

SELECT w.event, w.p1, w.p2, w.p3 FROM V$SESSION_WAIT w, V$EVENT_NAME e


WHERE e.name LIKE 'cell%' AND e.wait_class_id = w.wait_class_id;

SELECT w.event, c.cell_path, d.name, w.p3 FROM V$SESSION_WAIT w,


V$EVENT_NAME e, V$ASM_DISK d, V$CELL c
WHERE e.name LIKE 'cell%' AND e.wait_class_id = w.wait_class_id
AND w.p1 = c.cell_hashval AND w.p2 = d.hash_value;

Using V$SYSTEM_EVENT to Monitor Wait Events


The V$SYSTEM_EVENT view displays information about the number of total waits for an event.
Example shows how to query the V$SYSTEM_EVENT view.
Example: Using the V$SYSTEM_EVENT View

SELECT s.event FROM V$SYSTEM_EVENT s, V$EVENT_NAME e


WHERE e.name LIKE 'cell%' AND e.event_id = s.event_id;

Using V$SESSION_EVENT to Monitor Events by Sessions


The V$SESSION_EVENT view displays information about waits for an event by a session.
Following Example shows how to query the V$SESSION_EVENT view.

Example: Using the V$SESSION_EVENT view

SELECT s.event FROM V$SESSION_EVENT s, V$EVENT_NAME e


WHERE e.name LIKE 'cell%' AND e.event_id = s.event_id;

Cell smart table scan:

The cell smart table scan event is what Oracle uses to account for time spent waiting for Full Table scans
that are Offloaded. It is the most important new event on the Exadata platform. Its presence or absence
can be used to verify whether a statement benefited from Offloading or not. This event replaces the
direct path read event in most cases on Exadata. With normal direct path reads, data is returned directly
to the PGA of the requesting process on the database server (either the user’s shadow process or a
parallel slave process). Blocks are not returned to the buffer cache.

174
Cell smart index scan:

Time is clocked to the cell smart index scan event when fast full index scans are performed and that are
Off-loaded. This event is analogous to cell smart table scan, except that the object being scanned is an
index. It replaces the direct path read event and returns data directly to the PGA of the requesting
process as opposed to the buffer cache. This event does not show up very often on the systems we have
observed, probably for several reasons:

 Exadata is quite good at doing full table scans and so the tendency is to eliminate a lot of indexes
when moving to the platform.
 Direct path reads are not done as often on index scans as they are on table scans. One of the
important changes to Oracle 11.2 is the aggressiveness with which it does direct path reads on serial
table scans. This enhancement was probably pushed forward specifically in order to allow Exadata
to do more Smart Full Table Scans, but regardless of that, without this feature, only parallel table
scans would be able to take advantage of Smart Scans. The same enhancement applies to index fast
full scans. That is, they can also be done via serial direct path reads. However, the algorithm
controlling when they happen appears to be less likely to use this technique with indexes (probably
because the indexes are usually much smaller than tables).
 In addition, only fast full scans of indexes are eligible for Smart Scans (range scans and full scans are
not eligible).

Cell single block physical read:

This event is equivalent to the db file sequential read event used on non-Exadata platforms. Single block
reads are used most often for index access paths (both the index block reads and the table block reads
via row-ids from the index lookups). They can also be used for a wide variety of other operations where
it makes sense to read a single block.
Exadata provides a large amount of flash cache (384G) on each storage cell. For that reason, physical
reads (both multi-block and single-block) are considerably faster than on most disk-based storage
systems.

Cell multiblock physical read:

This is another renamed event. It is equivalent to the less clearly named db file scattered read event.
On non-Exadata platforms, Oracle Database 11gR2 still uses the db file scattered read event whenever it
issues a contiguous multi-block read to the operating system.

This event is generally used with Full Table Scans and Fast Full Index scans, although it can be used with
many other operations. The new name on the Exadata platform is much more descriptive than the older
name. This wait event is not nearly as prevalent on Exadata platforms as on non-Exadata platforms,
because Exadata handles many full scan operations with Smart Scans that have their own wait events
(cell smart table scan and cell smart index scan). The cell multiblock physical read event on Exadata
platforms is used for serial Full Scan operations on tables that are below the threshold for serial direct
path reads. That is to say, you will see this event used most often on Full Scans of relatively small tables.
It is also used for Fast Full Index Scans that are not executed with direct path reads.

175
Cell smart file creation:

Exadata has an optimization technique that allows the storage cells to do the initialization of blocks
when a data file is created or extended. This occurs when a tablespace is created or a data file is
manually added to a tablespace. However, it can also occur when a data file is automatically extended
during DML operations. This event replaces the “Datafile init write” event that is still used on non-
Exadata platforms.

Cell smart incremental backup:

This event is used to measure time spent waiting on RMAN when doing an Incremental Level 1 backup.
Exadata optimizes incremental backups by offloading much of the processing to the storage tier. This
new wait event was added to account for time spent waiting on the optimized incremental backup
processing that is offloaded to the storage cells.

Cell smart restore from backup:

This event is used to measure time spent waiting on RMAN when doing a restore. Exadata optimizes
RMAN restores by offloading processing to the storage cells.

DIFFERENCES OF EXADATA MACHINE X5-2 AND X4-2

Following are the major differences or features of Exadata Machine X5-2 compare to X4-2 machine.

 Solaris operating system is not supported on X5 anymore


 ED X5 supports OVM
 High performance configuration has been replaced by Extreme Flash
 Each extreme flash storage contains 8 x 1.6 TB PCI flash drives
 Disk controller battery eliminated for flash card
 CPU increased to 288 cores per full rack, 2 x 18 core per Database server
 Memory can be upgraded up to 768 GB per database server

176
EXADATA MONITORING
WHAT TO MONITOR:

 Exadata Storage Servers


 Database servers
 InfiniBand network
 Sun Power Distribution Units
 Cisco Catalyst Ethernet switch
 KVM

MONITORING TOOLS & UTILITIES:

Exachk: To check whether Exadata configuration has been done as per best practice or not.
ExaWatcher: It monitors server resources and gives past resource consumption details if required.
ASR: Auto Service Request is used to monitor Exadata hardware. Automatically it creates
service request in oracle support.
ILOM: ILOM is used to monitor the physical components
OEM: It used to monitor Exadata end to end. With OEM we can monitor each and every
components of Exadata. It is useful to monitor Hardware as well as software.
CLI: Manual monitoring using Command Line and/or scripts.

177
WHAT TO MONITOR:

OEM SCHEMATIC VIEW:

178
STORAGE SERVER MONITORING
An Exadata Storage Server is monitored as a single target in Enterprise Manager Cloud Control, which
covers the following components:
 Hardware
 Operating system
 Exadata Storage Server software

Exadata Storage Servers are independent units that are each identified as separate targets in EM12c.
However, storage servers are grouped together under the system dashboard for a Oracle Exadata
Database Machine so they are monitored together as a group.

Metrics
There are two different types of metrics related to Exadata Storage Server: storage server metrics and,
Enterprise Manager (EM) metrics. In most cases there is a one-to-one mapping between the two.
Exadata Storage Server Management Server (MS) collects, computes, and manages storage server
metrics. These storage server metrics are then gathered by Database server EM Agent and presented to
the user in EM12c as EM metrics.

Alerts
All Exadata Storage Server alerts are delivered by the storage server to EM12c using Simple Network
Management Protocol (SNMP) as traps. The communication between the Exadata Storage Server and
EM12c is done through the Database server.
There are two types of server alerts that come from Exadata Storage Server:
• For Integrated Lights Out Manager (ILOM)-monitored hardware components, ILOM reports a failure
or threshold exceeded condition as an SNMP trap, which is received by MS. MS processes the trap,
creates an alert for the storage server, and delivers the alert via SNMP to EM12c.
• For MS-monitored hardware and software components, MS processes a failure or threshold exceeded
condition for these components, creates an alert, and delivers the alert via SNMP. From an end-user
perspective there is no difference between these two kinds of alerts. An alert message contains
corrective action to perform to resolve the alert.

179
THIRD PARTY MONITORING TOOLS:
It is not permissible to install any additional software, including third party monitoring agents, on
Exadata Storage Server.

CELLCLI COMMANDS:
COMMAND RESULT
list cell attributes temperatureStatus OUTPUT
LIST LUN where status not= 'normal' NO OUTPUT
LIST CELLDISK WHERE status!=normal ATTRIBUTES name NO OUTPUT
list physicaldisk where status not = normal NO OUTPUT
LIST GRIDDISK where asmDeactivationOutcome not= yes NO OUTPUT
list griddisk attributes status | grep inactive NO OUTPUT
LIST GRIDDISK ATTRIBUTES name, asmModeStatus where asmModeStatus not= 'online' NO OUTPUT
list griddisk where errorCount > 0 NO OUTPUT
LIST FLASHCACHE where status not= normal NO OUTPUT
LIST ALERTHISTORY WHERE begintime > '2015-04-01T00:00:00-04:00' NO OUTPUT

These commands can be executed using DCLI from the First Database Node as a ROOT user from
/opt/oracle.SupportTools/onecommand/ location.

[root@exa01dbadm01 onecommand]# dcli -g cell_group -l root cellcli -e list cell attributes


temperatureStatus
exa01celadm01: normal
exa01celadm02: normal
exa01celadm03: normal

CELLSRVSTAT:
[root@exa01celadm01 ~]# cellsrvstat -h
LRM-00101: Message 101 not found; No message file for product=ORACORE, facility=LRM
Usage:
cellsrvstat [-stat_group=<group name>,<group name>,]
[-offload_group_name=<offload_group_name>,]
[-database_name=<database_name>,]
[-stat=<stat name>,<stat name>,] [-interval=<interval>]
[-count=<count>] [-table] [-short] [-list]

[root@exa01celadm01 ~]# cellsrvstat -list


Statistic Groups:
io Input/Output related stats
mem Memory related stats
exec Execution related stats
net Network related stats
smartio SmartIO related stats

180
flashcache FlashCache related stats
health Cellsrv health/events related stats
offload Offload server related stats
database Database related stats
ffi FFI related stats
lio LinuxBlockIO related stats

#dcli -g cellgroup -l root 'cellsrvstat -stat_group=io -interval=30 -count=2' > /tmp/cellstats.txt

DATABASE SERVER MONITORING


Database server monitoring is divided into the following sections:
 Hardware
 Operating system
 Oracle Grid Infrastructure
 Oracle Database
 Enterprise Manager Agent

Linux: top, mpstat, vmstat, iostat, fdisk, ustat, sar, sysinfo


Exadata: dcli
ASM: asmcmd, asmca
Clusterware: crsctl, srvctl

Hardware:
ILOM monitors availability and sensor state using preset thresholds for hardware components of
database server, such as system motherboard, processors, memory, power supplies, fans, and network
interface controllers. The availability and sensor state can be monitored using the Oracle Exadata
Database Server Integrated Lights Out Management (ILOM).

There are no Exadata-specific thresholds to set for database server hardware monitoring. Failure
conditions and threshold settings for hardware sensor readings for the components monitored by ILOM
are preset in ILOM and are sufficient for the level of monitoring necessary for Exadata.

Alerts
To view the history of alerts that have been generated by ILOM navigate to the target home page and
expand the Sensor Alert section under All Metrics and review the metrics for each sensor. A history of
each sensor state is available for up to 31 days.

181
182
INFINIBAND MONITORING:
InfiniBand network monitoring can be divided into three areas for monitoring purposes: InfiniBand
switches, InfiniBand ports, and InfiniBand fabric.
InfiniBand monitoring can be done in following ways:
Switch diagnostics: To Monitor Switch.
Ex: Full environment test, Show unhealthy environment sensors and Show the temperatures
InfiniBand diagnostics: Monitor all the InfiniBand related diagnostics.
Ex: Show status of InfiniBand network, Query basic status of InfiniBand device and List all local
ports/connectors
Subnet Manager Administration: Changes to the Subnet Manager.
Ex: Enable or Disable local Subnet Manager and Set subnet prefix for local Subnet Manager
Configuration Management: Changes in the configuration.
Ex: Enable or Disable a switch port and Create or Delete IPoIB interface

OEM MONITORING:

183
IMPORTANT COMMANDS:
The showunhealthy command should produce the output “OK - No unhealthy sensors”. If output differs,
run the command env_test to get detailed status information for all switch sensors. Note that
showunhealthy will indicate “OK - No unhealthy sensors” if a power supply is offline, so they must be
checked separately.

[root@exa01sw-iba01 ~]# showunhealthy


OK - No unhealthy sensors

[root@exa01sw-iba01 ~]# ibstatus


Infiniband device 'is4_0' port 0 status:
default gid: fe80:0000:0000:0000:0010:e040:6cec:a0a0
base lid: 0x18
sm lid: 0xa
state: 4: ACTIVE
phys state: 5: LinkUp
rate: 40 Gb/sec (4X QDR)

[root@exa01dbadm01 ibdiagtools]# pwd


/opt/oracle.SupportTools/ibdiagtools
[root@exa01dbadm01 ibdiagtools]# ./verify-topology -h
Usage: ./verify-topology [-v|--verbose] [-r|--reuse (cached maps)] [-m|--mapfile]

184
[-ibn|--ibnetdiscover (specify location of ibnetdiscover output)]
[-ibh|--ibhosts (specify location of ibhosts output)]
[-ibs|--ibswitches (specify location of ibswitches output)]
[-t|--topology [torus | quarterrack |flex ] default is fattree]
[-a|--additional [interconnected_quarterrack]
[-factory|--factory non-exadata machines are treated as error]
[-ssc|--ssc to test ssc on fake hardware as if on t4-4]
[-t5ssc|--t5ssc to test ssc on fake hardware as if on t5-8]
[-m6ssc|--m6ssc to test ssc on fake hardware as if on m6-32]

Please note that halfrack is now redundant. Checks for Half Racks are now done by default.
-t quarterrack
option is needed to be used only if testing on a stand alone quarterrack
-a interconnected_quarterrack
option is to be used only when testing on large multi-rack setups
-t fattree
option is the default option and not required to be specified
-t flex
option is to be used only on system using flex configuration
Please note that -ibn -ibh -ibs options must have -r specified

Example : perl ./verify-topology


Example : ././verify-topology -t quarterrack
Example : ././verify-topology -t torus
Example : ././verify-topology -a interconnected_quarterrack
Example : ././verify-topology -t flex
--------- Some Important properties of the fattree cabling topology--------------
(1) Every internal switch must be connected to every external switch
(2) No 2 external switches must be connected to each other
-------------------------------------------------------------------------------
Please note that switch guid can be determined by logging in to a switch and trying either of these
commands, depending on availability -
>module-firmware show OR >opensm

[root@exa01dbadm01 ibdiagtools]# ./verify-topology -t fattree

[ DB Machine Infiniband Cabling Topology Verification Tool ]


[Version IBD VER 2.d ]
External non-Exadata-image nodes found:
...will check for ZFS if on SSC - else ignore

Found 2 leaf, 2 spine, 0 top spine switches


Check if all hosts have 2 HCAs to different switches...............[SUCCESS]
Leaf switch check: cardinality and even distribution..............

185
[ERROR] Leaf Switch 10.xx.xx.205 with GUID 0xxxxxxxxxxxxxxxx0 has fewer than 4 compute nodes links.
It has 2 links (14B 13B ) to compute nodes

[ERROR] Leaf Switch 10.xx.xx.205 with GUID 0xxxxxxxxxxxxxxxx0 has fewer than 7 links to storage cells.
It has 3 links (17A 17B 16B )to storage cells

[ERROR] Leaf Switch 10.xx.xx.206 with GUID 0xxxxxxxxxxxxxxxx0 has fewer than 4 compute nodes links.
It has 2 links ( 14B 13B) to compute nodes

[ERROR] Leaf Switch 10.xx.xx.206 with GUID 0xxxxxxxxxxxxxxxx0 has fewer than 7 links to storage cells.
It has 3 links ( 17A 17B 16B)to storage cells

[ERROR] 2 switches did not meet this requirement

Spine switch check: Are any Exadata nodes connected ..............[SUCCESS]


Spine switch check: Any inter spine switch links..................
[ERROR] Spine switches 10.xx.xx.179 (xxxxxxxxxxxxx) & leaf:1 (xxxxxxxxxxxxx) should not be connected

Spine switch check: Any inter top-spine switch links..............[SUCCESS]

Spine switch check: Correct number of spine-leaf links............[SUCCESS]


Leaf switch check: Inter-leaf link check..........................
[ERROR] Leaf switches 10.xx.xx.205 (xxxxxxxxxxxxx) & 10.xx.xx.206 (xxxxxxxxxxxxx) have 7 links between
them. They should have 3 links instead.

Leaf switch check: Correct number of leaf-spine links.............[SUCCESS]

[root@exa01dbadm01 ibdiagtools]# ./verify-topology -t quarterrack

[ DB Machine Infiniband Cabling Topology Verification Tool ]


[Version IBD VER 2.d ]
External non-Exadata-image nodes found:
...will check for ZFS if on SSC - else ignore

--------------- Quarter Rack Exadata V2 Cabling Check---------

Check if all hosts have 2 CAs to different switches..................[SUCCESS]


Leaf switch check: cardinality and even distribution.................[SUCCESS]
Check if each rack has an valid internal ring........................[SUCCESS]

[root@exa01sw-iba01 ~]# perfquery


# Port counters: Lid 24 port 0
PortSelect:......................0
CounterSelect:...................0x1b01
SymbolErrors:....................0

186
LinkRecovers:....................0
LinkDowned:......................0
RcvErrors:.......................0
RcvRemotePhysErrors:.............0
RcvSwRelayErrors:................0
XmtDiscards:.....................0
XmtConstraintErrors:.............0
RcvConstraintErrors:.............0
LinkIntegrityErrors:.............0
ExcBufOverrunErrors:.............0
VL15Dropped:.....................0
XmtData:.........................4294967294
RcvData:.........................4294967295
XmtPkts:.........................1027564564
RcvPkts:.........................4294967295

[root@exa01sw-iba01 ~]# env_test


Environment test started:
Starting Environment Daemon test:
Environment daemon running
Environment Daemon test returned OK
Starting Voltage test:
Voltage ECB OK
Measured 3.3V Main = 3.27 V
Measured 3.3V Standby = 3.39 V
Measured 12V = 11.90 V
Measured 5V = 5.02 V
Measured VBAT = 3.09 V
Measured 2.5V = 2.50 V
Measured 1.8V = 1.78 V
Measured I4 1.2V = 1.22 V
Voltage test returned OK
Starting PSU test:
PSU 0 present OK
PSU 1 present OK
PSU test returned OK
Starting Temperature test:
Back temperature 29
Front temperature 31
SP temperature 50
Switch temperature 52, maxtemperature 56
Temperature test returned OK
Starting FAN test:
Fan 0 not present

187
Fan 1 running at rpm 12208
Fan 2 running at rpm 12317
Fan 3 running at rpm 12208
Fan 4 not present
FAN test returned OK
Starting Connector test:
Connector test returned OK
Starting Onboard ibdevice test:
Switch OK
All Internal ibdevices OK
Onboard ibdevice test returned OK
Starting SSD test:
SSD test returned OK
Environment test PASSED

IB SWITCH TOPOLOGIES:

188
PDU MONITORING:

PDUs can be monitored by HTML Interface & OEM.


Monitor the Enhanced PDU (HTML Interface):

189
SNMP TRAP:

190
Useful Oracle Exadata Metalink Notes:
Doc ID Document Description
-------- -------------------------------
888828.1 Exadata Database Machine and Exadata Storage Server Supported Versions
1270094.1 Exadata Critical Issues
1353073.1 Exadata Diagnostics Collection Guide
1187674.1 Master Note for Oracle Exadata Database Machine and Oracle Exadata Stororage Server
1483344.1 Exadata Platinum Customer Outage Classifications and Restoration Action Plans
1571965.1 Maximizing Availability with Engineered Systems - Exadata
1262380.1 Exadata Testing Practices and Patching Explained
1306814.1 Oracle Software Patching with OPLAN
1110675.1 Oracle Exadata Database Machine Monitoring
1070954.1 Database Machine Health check (Exachk)
1094934.1 Best Practices for Data Warehousing on the Database Machine
1269706.1 Best Practices for OLTP Applications on the Database Machine
1071221.1 Oracle Sun Database Machine Backup and Recovery Best Practices
1054431.1 Configuring DBFS on Oracle Exadata Database Machine
1084360.1 Bare Metal Restore Procedure for Compute Nodes on an Exadata Environment (Linux)
1339769.1 Master Note for Oracle Database Resource Manager
960510.1 Data Guard Transport Considerations on Oracle Database Machine
1551288.1 Understanding ASM Capacity and Reservation of Free Space in Exadata
401749.1 Shell Script to Calculate Values Recommended Linux HugePages / HugeTLB
Configuration
1009715.1 Integrated Lights Out Manager (ILOM) CLI Quick Reference
1070954.1 Oracle Exadata Database Machine exachk or HealthCheck
1317159.1 Changing IP addresses on Exadata Database Machine
1244344.1 Exadata Starter Kit
1537407.1 Requirements and restrictions when using Oracle Database 12c on Exadata Database
Machine
1551288.1 Understanding ASM Capacity and Reservation of Free Space in Exadata
1459611.1 How to Calculate Usable_FILE_MB / REQUIRED_MIRROR_FREE_MB
361468.1 HugePages on Oracle Linux 64-bit
761868.1 Oracle Exadata Diagnostic Information required for Disk Failures and some other
Hardware issues
10386736 Documentation for Exadata 11.2 & 12.1

191

Das könnte Ihnen auch gefallen