Sie sind auf Seite 1von 27

<Insert Picture Here>

ASM Troubleshooting

Yahoo!
August 2009
Kevin Moore
Technical Lead, Advanced Customer Services
ASM L & L Topics

1. ASM init.ora Parameters


2. ASM Alert log messages
3. Yahoo! Alert Log & messages
4. ASM data Gathering
5. Troubleshooting Scenarios
6. Instance Events
7. Instance Tracing
8. ASM Rebalancing operations
9. ASM Extent management
10. Performance Considerations
11. ASM Templates
12. Background Processes
13. ASM Views
14. ASMCMD Commands
15. New 11g Commands
16. ASM MySupport Documents

Advanced Customer Services


ASM initSID.ora
• ##############################################################################
• # Copyright (c) 1991, 2001, 2002 by Oracle Corporation
• ##############################################################################

• ###########################################
• # Cluster Database
• ###########################################
• cluster_database=true

• ###########################################
• # Miscellaneous
• ###########################################
• diagnostic_dest=/home/oracle
• instance_type=asm

• ###########################################
• # Pools
• ###########################################
• large_pool_size=12M

• asm_diskgroups='DATA'

• +ASM2.instance_number=2
• +ASM1.instance_number=1

Advanced Customer Services


ASM Alert Log
• Mon Aug 24 15:14:10 2009
• Starting ORACLE instance (normal)
• LICENSE_MAX_SESSION = 0
• LICENSE_SESSIONS_WARNING = 0
• Interface type 1 eth0 192.168.1.0 configured from OCR for use as a cluster interconnect
• Interface type 1 eth1 67.0.0.0 configured from OCR for use as a public interface
• Picked latch-free SCN scheme 2
• Using LOG_ARCHIVE_DEST_1 parameter default value as
/home/oracle/oracle/product/11.1.0/rdbms/dbs/arch
• Autotune of undo retention is turned on.
• LICENSE_MAX_USERS = 0
• SYS auditing is disabled
• Starting up ORACLE RDBMS Version: 11.1.0.6.0.
• Using parameter settings in server-side pfile /home/oracle/oracle/product/11.1.0/rdbms/dbs/init+ASM1.ora
• System parameters with non-default values:
• large_pool_size = 12M
• instance_type = "asm"
• cluster_database = TRUE
• instance_number =1
• asm_diskgroups = "DATA"
• diagnostic_dest = "/home/oracle"
• Cluster communication is configured to use the following interface(s) for this instance
• 192.168.1.161
• cluster interconnect IPC version:Oracle UDP/IP (generic)
• IPC Vendor 1 proto 2

Advanced Customer Services


Yahoo! ASM Alert Log
• Sun May 4 00:19:05 2008
• kjbdomatt send to node 0 * One line for each node *
• kjbdomatt send to node 1
• kjbdomatt send to node 2
• NOTE: F1X0 found on disk 0 fcn 0.0
• NOTE: cache opening disk 1 of grp 2: DISK116 label:DISK116 * One line for each node *
• NOTE: cache opening disk 2 of grp 2: DISK117 label:DISK117
• NOTE: attached to recovery domain 2
• Sun May 4 00:19:14 2008
• NOTE: recovering COD for group 1/0x8ccb7277 (DATA) * Metadata for tracking long running trx *
• SUCCESS: completed COD recovery for group 1/0x8ccb7277 (DATA)
• Sun May 4 00:19:14 2008
• NOTE: opening chunk 14 at fcn 0.0 ABA
• NOTE: seq=2 blk=0
• Sun May 4 00:19:14 2008
• NOTE: cache mounting group 2/0x8CDB7278 (TEMP) succeeded
• SUCCESS: diskgroup TEMP was mounted
• Sun May 4 00:19:17 2008
• NOTE: recovering COD for group 2/0x8cdb7278 (TEMP)
• SUCCESS: completed COD recovery for group 2/0x8cdb7278 (TEMP)
• NOTE: enlarging ACD for group 1/0x8ccb7277 (DATA)
• Sun May 4 00:21:10 2008
• SUCCESS: ACD enlarged for group 1/0x8ccb7277 (DATA) * Metadata REDO *
• NOTE: enlarging ACD for group 2/0x8cdb7278 (TEMP)
• SUCCESS: ACD enlarged for group 2/0x8cdb7278 (TEMP)

Advanced Customer Services


ASM Data gathering
• Please gather all files from the ASM bdump and udump directories
covering the specified time frame of the problem - be sure to include
alert logs for ALL ASM instances.
• For Hang/Performance issues, please gather System state dumps from
ASM instances
• Please use the script below for querying ASM views, and provide the
spooled output (each instance).
set newpage none
set feedback off
set heading off
set termout off
column grp format 99
column disk format 99999
column lxn format 999
column flg format 999
column chk format 999
spool asm
select group_number as grp, name, state, type, total_mb, free_mb from v$asm_diskgroup;
select rpad('>', 10, '>'), to_char(sysdate, 'MON DD HH24:MM:SS') from dual;
select group_kfdat, number_kfdat, aunum_kfdat, v_kfdat, fnum_kfdat, i_kfdat, xnum_kfdat, raw_kfdat from x$kfdat;
select rpad('>', 10, '>'), to_char(sysdate, 'MON DD HH24:MM:SS') from dual;
select grp, disk, NUMBER_KFDPARTNER, PARITY_KFDPARTNER, ACTIVE_KFDPARTNER from x$kfdpartner;
select rpad('>', 10, '>'), to_char(sysdate, 'MON DD HH24:MM:SS') from dual;
select group_kffxp as grp, number_kffxp as num, incarn_kffxp as incarn, PXN_KFFXP, XNUM_KFFXP, LXN_KFFXP as lxn, DISK_KFFXP as
disk, AU_KFFXP, FLAGS_KFFXP as flg, CHK_KFFXP as chk from x$kffxp;
select rpad('>', 10, '>'), to_char(sysdate, 'MON DD HH24:MM:SS') from dual;
set linesize 1500
select GROUP_NUMBER, DISK_NUMBER, INCARNATION, MOUNT_STATUS, HEADER_STATUS, MODE_STATUS, STATE, LIBRARY,
TOTAL_MB, FREE_MB, NAME, FAILGROUP, LABEL, PATH, CREATE_DATE, MOUNT_DATE, READS, WRITES, READ_ERRS,
WRITE_ERRS, READ_TIME, WRITE_TIME, BYTES_READ, BYTES_WRITTEN from v$asm_disk;
spool off
exit

Advanced Customer Services


ASM Troubleshooting Scenarios
• ASM space issues

1. ASM level errors


• ORA-15041
• ORA-15047

2. RDBMS level errors when storage is on ASM

3. Inconsistencies between what is perceived as the available space

4. Inconsistencies between V$ASM_DISKGROUP and X$ views

Note #351117.1 - Information to gather when diagnosing ASM space issues contains
scripts for collecting specific ASM information

Advanced Customer Services


ASM Troubleshooting Scenarios
• ASM Disk Missing

1. Use OS utilities to determine which disk cannot be found

TRUSSing or STRACEing the RBAL process while selecting * from v$asm_disk can often show errors in the path of the
command

SESSION #1

strace -f -o /tmp/rbal.trc -p <OS pid of RBAL process>


<OR>
truss -ef -o /tmp/rbal.out -p <OS pid for RBAL process>

SESSION #2

select * from v$asm_disk

SESSION #3

tail –f /tmp/rbal.trc

Examine the rbal.out for errors:

1147090: 1871929: chdir("dev/") = 0


1147090: 1871929: statx("rhdisk8, ", 0x0FFFFFFFFFFFAA80, 176, 010) Err#2 ENOENT

This says that rhdisk8 cannot be found

2. ORA-15063: ASM discovered an insufficient number of disks for diskgroup s%


ORA-15040: diskgroup is incomplete
ORA-15042: ASM disk "%" is missing

Note #452770.1- ASM disk not found/visible/discovered issues

Advanced Customer Services


ASM Troubleshooting Scenarios
• ASM is Unable to Detect ASMLIB Disks/Devices

1. First of all, please scan the disks (on all the nodes if RAC):

dbaasm.us.oracle.com:+ASM:oracle:11g>/etc/init.d/oracleasm scandisks
Scanning system for ASM disks: OK ]

2) Second, make sure the disks can be listed :

dbaasm.us.oracle.com:+ASM:oracle:11g>/etc/init.d/oracleasm listdisks
VOL1_10G
VOL2_10G

3) Query each disks:

dbaasm.us.oracle.com:+ASM:oracle:11g>/etc/init.d/oracleasm querydisk VOL1_10G


Disk "VOL1_10G" is a valid ASM disk on device [3, 18]
dbaasm.us.oracle.com:+ASM:oracle:11g>/etc/init.d/oracleasm querydisk VOL2_10G
Disk "VOL2_10G" is a valid ASM disk on device [3, 22]

4) Check if they exist at OS level:

dbaasm.us.oracle.com:+ASM:oracle:11g>ls -l /dev/oracleasm/disks/VOL1_10G
brw-rw---- 1 oracle dba 3, 18 Aug 13 09:54 /dev/oracleasm/disks/VOL1_10G
dbaasm.us.oracle.com:+ASM:oracle:11g>ls -l /dev/oracleasm/disks/VOL2_10G
brw-rw---- 1 oracle dba 3, 22 Aug 13 09:55 /dev/oracleasm/disks/VOL2_10G

5) Then, in the initialization parameter file set the discovery disks string parameter as follow:

asm_diskstring =ORCL:*

Note: Also, you can set it thru the DBCA (during the diskgroup(s) creation) by pressing the [Change Disk Discovery Path]
button.

Advanced Customer Services


ASM Troubleshooting Scenarios
• ASM is Unable to Detect ASMLIB Disks/Devices (LINUX Specific)
1. 6) If the problem persists then you can set the discovery disks string as follow:

asm_diskstring = /dev/oracleasm/disks/*

7) As workaround you can setasm_diskstring = /dev/oracleasm/disks/*, this is possible for Oracle 10g Release 2 and onwards since it
can access block devices. Oracle uses O_DIRECT flag, which can be used for opening block devices to bypass the OS cache.

8) If the problem persists, please open a new service request with Oracle support and then please provide us the next information
(from all the nodes if RAC) :

2. Upload the next files:


3. =======================================
=)> /var/log/messages
4. =)> New /etc/sysconfig/oracleasm

=)> alert+ASM#.log for each instance.


================================

And the output of the next commands


5. ================================
6.
$> cat /etc/*release
$> uname -a
$> rpm -qa |grep oracleasm
$> df -ha
$> ls -l /dev/oracleasm/disks
$> powermt display dev=emcpower# (On all the partitions if using PowerPath from EMC)
7. ================================
$> /etc/init.d/oracleasm status
$> usr/sbin/oracleasm-discover
$> /usr/sbin/oracleasm-discover 'ORCL:*'

SQL> show parameter asm

Note #457369.1- ASM is Unable to Detect ASMLIB Disks/Devices

Advanced Customer Services


ASM Instance Events
• Applicable Event Levels (15xxx)
• Level 7 - DEBUG - Trace information for ASM/OSM debugging purposes only
• Level 6 - NLOOPS - Trace deeply nested loops within a function
• Level 5 - LOOPS - Trace loops within a function
• Level 4 - CALLS - Trace function call entry
• Level 3 - NORMAL - Trace normal paths within a function
• Level 2 - WARN - Trace warning paths within a function
• Level 1 - ERROR - Trace error paths within a function

• Kx 0x0000010 /* Array portion flags */


Kxx 0x0000020 /* Alias-Directory operations */
Kxx 0x0000040 /* Block validation interface */
Kxx 0x0000080 /* metadata cache */
Kxx 0x0000100 /* disk operations */
Kxx 0x0000200 /* file operations */
Kxx 0x0000400 /* disk group operations */
Kxx 0x0000800 /* I/O layer (to ASMLIB or KSFD) */
Kxx 0x0001000 /* node monitor (ie CSS interface) */
Kxx 0x0002000 /* network layer (ie RDBMS-ASM connections) */
Kxx 0x0004000 /* PLSQL package */
Kxx 0x0008000 /* recovery */
Kxx 0x0010000 /* templates */
Kxx 0x0020000 /* SQL execution (processing ASM SQL commands) */
Kxxx 0x0040000 /* ASM DBWR */
Kxxx 0x0080000 /* ASM LGWR */
Kxxx 0x0100000 /* I/O handles mirroring, striping, etc. */

Advanced Customer Services


ASM Instance Tracing
• Trace RBAL process

• [oracle@rac1 ~]$ ps -ef | grep rbal


• oracle 7745 1 0 09:24 ? 00:00:02 asm_rbal_+ASM1
• oracle 9255 1 0 09:27 ? 00:00:00 ora_rbal_whsed1
• oracle 9971 5367 0 11:31 pts/1 00:00:00 grep rbal
• [oracle@rac1 ~]$ strace -f -o /tmp/rbal.trc -p 7745
• Process 7745 attached - interrupt to quit
• Process 7745 detached

• more /tmp/rbal.trc

• 7745 semtimedop(163842, 0xbfb973f4, 1, {2, 350000000}) = -1 EAGAIN (Resource te


• mporarily unavailable)
• 7745 gettimeofday({1251917133, 714243}, NULL) = 0
• 7745 gettimeofday({1251917133, 714337}, NULL) = 0
• 7745 gettimeofday({1251917133, 714395}, NULL) = 0
• 7745 getrusage(RUSAGE_SELF, {ru_utime={2, 79683}, ru_stime={1, 81835}, ...}) =
• 7745 sendmsg(13, {msg_name(16)={sa_family=AF_INET, sin_port=htons(32963), sin_a
• ddr=inet_addr("192.168.1.162")}, msg_iov(2)=[{"\4\3\2\1\327\263\200\0\0\0\0\0MRO
• N\0\1\0\0\220\0\0\0\1"..., 68}, {"KSXP\2\0\0\0\1\0\2\0\20\0\0\0\4\0\0\0\0\0\0\0\
• 0\0\0\0r"..., 144}], msg_controllen=0, msg_flags=0}, 0) = 212

"buffer busy“ or “rdbms ipc reply” events

Advanced Customer Services


ASM Rebalancing
• Rebalancing is the activity of spreading data amongst disks in
an ASM group
• Happens in the background but can be done manually
• Internally the balance happens on a file per file basis
• Only one RBAL process runs per node
• Rebalance request on the same diskgroup are done serially
• ASM decides how best to balance load across available disks
• Uses one of three allocation schemes for selecting disks
1. Placement by file/extent number
2. Random-seeded ordering of all disks in the ASM disk directory
3. Balanced placement over all disks

Advanced Customer Services


ASM Rebalancing
• Parallel execution based on rebalance POWER
• POWER settings are 1-11 (default 1)
• Used to throttle overhead during normal operations
• Rebalance moves 1mb chunks at a time
• Setting POWER to 0 defers rebalancing to another time

Advanced Customer Services


ASM Rebalancing
Displaying & changing rebalance POWER setting

• SQL> show parameter limit

NAME TYPE VALUE


------------------------------------ ----------- ------
asm_power_limit integer 1

• Changing setting

• SQL> alter diskgroup dg1 rebalance power 8;

• Verifying Change

• SQL> select * from v$asm_operation;

GROUP_NUMBER OPERA STAT POWER ACTUAL SOFAR EST_WORK EST_RATE


------------ ----- ---- ---------- ---------- ---------- ---------- ----------
1 REBAL RUN 8 8 0 407 0

Advanced Customer Services


ASM AU/Extent Management

• Allocation Units (AU) at the disk level and Extents at the file level
• Default AU size is 1mb
• Default extent size is 1mb
• Extents are allocated in 1, 4, 16, & 64mb chunks (11g)
• Extent placement is circular when disks are the same size
• Cannot be changed without recreating the diskgroup
• Templates can be created and added to diskgroups

Advanced Customer Services


ASM Performance Considerations

• Metadata ONLY is Cached In The ASM Instance


• ASM Diskgroup Configuration
• External Redundancy
• Normal Redundancy (default)
• High Redundency
• ASM Instance Configuration (large_pool_size)
• Resolving ORA-4031
• ASM Allocation Unit Size (1mb default)
• ASM Fine Grained Stripe Size (8x128k Stripes)
• MAX I/O Size
• Oracle Block Size

Advanced Customer Services


ASM Default Template

• Archivelog Files - Coarse


• Autobackup - Coarse
• Controlfile - Fine Grained
• Datafile - Coarse
• Flashback data - Fine Grained
• Online REDO - Fine Grained
• SPFILE - Coarse
• Tempfile - Coarse

Coarse – 1mb stripe size


Fine Grained – 8 x 128k stripes

Advanced Customer Services


ASM Templates

• Striping Attributes – Fine, Coarse


• Redundancy Attributes
• Mirror – 2 way
• High – 3 way
• Unprotected – Not mirrored

Advanced Customer Services


ASM Templates

• Viewing Template
• select * from V$ASM_TEMPLATE;

• Altering Template
• Alter diskgroup DG modify template NAME attributes (coarse/fine);

• Adding Template
• Alter diskgroup DG add template NAME attributes (attributes);

• Dropping Templates
• Alter diskgroup DG drop template NAME;

Advanced Customer Services


ASM Background Processes
• ora_asmb_whsed1 - Foregrounds servicing clients commands from client <procname> of database
• asm_pmon_+ASM1 - Process monitor, same as database
• asm_vktm_+ASM1 - Process to maintain a fast timer, same as database
• asm_diag_+ASM1 - Diag process, same as database
• asm_ping_+ASM1 - Process to measure network latency, same as database
• asm_psp0_+ASM1 - Process that Starts other Processes, used to startup other backgrounds
• asm_dia0_+ASM1 - Diag slave process, same as database
• asm_lmon_+ASM1 - Lock monitor, Same as database
• asm_lmd0_+ASM1 - Lock monitor diag, Same as database
• asm_lms0_+ASM1 - Lock monitor slaves, same as database
• asm_mman_+ASM1 - Autotune SGA process, Same as Database.
• asm_dbw0_+ASM1 - DB writes, same as database DB writer, but deals with ASM cache
• asm_lgwr_+ASM1 - Log writer, similar to database, but deals with diskgroups
• asm_ckpt_+ASM1 - Checkpoint process, Similar to database CKPT
• asm_smon_+ASM1 - Recovery process, Same as database SMON, but deals with diskgroup recovery
• asm_rbal_+ASM1 - Background process that is used for diskgroup management
• asm_gmon_+ASM1 - Group monitor, used for partner and status table, and node membership
• asm_lck0_+ASM1 - Lock monitor slave, Same as database

Advanced Customer Services


ASM Views (10g & 11G)
<Insert Picture Here>

View Contents
V$ASM_ALIAS Alias for each disk group mounted by the ASM
instance
V$ASM_CLIENT Identifies databases using disk groups managed by
the ASM instance.
V$ASM_DISK Disks discovered by the ASM instance

V$ASM_DISKGROUP Disk groups known by the ASM instance

V$ASM_FILE File list for each disk group mounted by the ASM
instance
V$ASM_OPERATION Long running operations executing in the ASM
instance
V$ASM_TEMPLATE Templates present in each ASM mounted disk group

Advanced Customer Services


ASMCMD Command Reference
<Insert Picture Here>

• cd - Changes the current directory to the specified directory.


• du - Displays the total disk space occupied by ASM files in the specified
ASM directory
• exit - Exits ASMCMD.
• find - Lists the paths of the specified name (with wildcards) under the
specified directory.
• help - Displays the syntax and description of ASMCMD commands.
• ls - Lists the contents of an ASM directory, attributes of the sfile, or the names and attributes
of all disk groups.
• lsct - Lists information about current ASM clients.
• lsdg - Lists all disk groups and their attributes.
• mkalias - Creates an alias for a system-generated filename.
• mkdir - Creates ASM directory.
• pwd - Displays the path of the current ASM directory.
• m - Deletes the specified ASM files or directories.
• rmalias - Deletes the specified alias, retaining the file that the alias

Advanced Customer Services


New 11g ASM Commands
<Insert Picture Here>

cp - Enables you to copy files between ASM disk groups on local instances and
remote instances.

lsdsk -ASM can list disk information with or without a running ASM instance. Also
useful for system or storage administrators to obtain lists of disks that
an ASM instance uses.

md_backup and md_restore - These commands enable you to re-create a pre-existing ASM
disk group with the same disk path, disk name, failure groups, attributes,templates and alias
directory structure. You can use md_backup to back up the disk group environment and use
md_restore to re-create the disk group before loading from a database backup.

remap - You can remap and recover bad blocks on an ASM disk in normal or high redundancy
that have been reported by storage management tools such as disk scrubbers. ASM reads from
the good copy of an ASM mirror and rewrites these blocks to an alternate location on disk.

Advanced Customer Services


MySupport ASM References
<Insert Picture Here>

Note: 340417.1 - Data Gathering for Troubleshooting ASM Issues


Note: 267982.1 - Automatic Storage Management (ASM) Knowledge Browser Product Page
Note:824354.1 - How To Trace ASMCMD on Unix
Note:351866.1 - How To Reclaim ASM Disk Space
Note:345180.1 - How to duplicate a controlfile when ASM is involved
Note:553319.1 - ORA-15036 When Starting An ASM Instance

Advanced Customer Services


Advanced Customer Services
Advanced Customer Services

Das könnte Ihnen auch gefallen