Sie sind auf Seite 1von 27

<Insert Picture Here>

ASM Troubleshooting

Yahoo!
August 2009
Kevin Moore
Technical Lead, Advanced Customer Services

ASM L & L Topics


1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. ASM init.ora Parameters ASM Alert log messages Yahoo! Alert Log & messages ASM data Gathering Troubleshooting Scenarios Instance Events Instance Tracing ASM Rebalancing operations ASM Extent management Performance Considerations ASM Templates Background Processes ASM Views ASMCMD Commands New 11g Commands ASM MySupport Documents

Advanced Customer Services

ASM initSID.ora
############################################################################## # Copyright (c) 1991, 2001, 2002 by Oracle Corporation ############################################################################## ########################################### # Cluster Database ########################################### cluster_database=true ########################################### # Miscellaneous ########################################### diagnostic_dest=/home/oracle instance_type=asm ########################################### # Pools ########################################### large_pool_size=12M asm_diskgroups='DATA' +ASM2.instance_number=2 +ASM1.instance_number=1

Advanced Customer Services

ASM Alert Log


Mon Aug 24 15:14:10 2009 Starting ORACLE instance (normal) LICENSE_MAX_SESSION = 0 LICENSE_SESSIONS_WARNING = 0 Interface type 1 eth0 192.168.1.0 configured from OCR for use as a cluster interconnect Interface type 1 eth1 67.0.0.0 configured from OCR for use as a public interface Picked latch-free SCN scheme 2 Using LOG_ARCHIVE_DEST_1 parameter default value as /home/oracle/oracle/product/11.1.0/rdbms/dbs/arch Autotune of undo retention is turned on. LICENSE_MAX_USERS = 0 SYS auditing is disabled Starting up ORACLE RDBMS Version: 11.1.0.6.0. Using parameter settings in server-side pfile /home/oracle/oracle/product/11.1.0/rdbms/dbs/init+ASM1.ora System parameters with non-default values: large_pool_size = 12M instance_type = "asm" cluster_database = TRUE instance_number =1 asm_diskgroups = "DATA" diagnostic_dest = "/home/oracle" Cluster communication is configured to use the following interface(s) for this instance 192.168.1.161 cluster interconnect IPC version:Oracle UDP/IP (generic) IPC Vendor 1 proto 2

Advanced Customer Services

Yahoo! ASM Alert Log


Sun May 4 00:19:05 2008 kjbdomatt send to node 0 * One line for each node * kjbdomatt send to node 1 kjbdomatt send to node 2 NOTE: F1X0 found on disk 0 fcn 0.0 NOTE: cache opening disk 1 of grp 2: DISK116 label:DISK116 * One line for each node * NOTE: cache opening disk 2 of grp 2: DISK117 label:DISK117 NOTE: attached to recovery domain 2 Sun May 4 00:19:14 2008 NOTE: recovering COD for group 1/0x8ccb7277 (DATA) * Metadata for tracking long running trx * SUCCESS: completed COD recovery for group 1/0x8ccb7277 (DATA) Sun May 4 00:19:14 2008 NOTE: opening chunk 14 at fcn 0.0 ABA NOTE: seq=2 blk=0 Sun May 4 00:19:14 2008 NOTE: cache mounting group 2/0x8CDB7278 (TEMP) succeeded SUCCESS: diskgroup TEMP was mounted Sun May 4 00:19:17 2008 NOTE: recovering COD for group 2/0x8cdb7278 (TEMP) SUCCESS: completed COD recovery for group 2/0x8cdb7278 (TEMP) NOTE: enlarging ACD for group 1/0x8ccb7277 (DATA) Sun May 4 00:21:10 2008 SUCCESS: ACD enlarged for group 1/0x8ccb7277 (DATA) * Metadata REDO * NOTE: enlarging ACD for group 2/0x8cdb7278 (TEMP) SUCCESS: ACD enlarged for group 2/0x8cdb7278 (TEMP)

Advanced Customer Services

ASM Data gathering


Please gather all files from the ASM bdump and udump directories covering the specified time frame of the problem - be sure to include alert logs for ALL ASM instances. For Hang/Performance issues, please gather System state dumps from ASM instances Please use the script below for querying ASM views, and provide the spooled output (each instance).
set newpage none set feedback off set heading off set termout off column grp format 99 column disk format 99999 column lxn format 999 column flg format 999 column chk format 999 spool asm select group_number as grp, name, state, type, total_mb, free_mb from v$asm_diskgroup; select rpad('>', 10, '>'), to_char(sysdate, 'MON DD HH24:MM:SS') from dual; select group_kfdat, number_kfdat, aunum_kfdat, v_kfdat, fnum_kfdat, i_kfdat, xnum_kfdat, raw_kfdat from x$kfdat; select rpad('>', 10, '>'), to_char(sysdate, 'MON DD HH24:MM:SS') from dual; select grp, disk, NUMBER_KFDPARTNER, PARITY_KFDPARTNER, ACTIVE_KFDPARTNER from x$kfdpartner; select rpad('>', 10, '>'), to_char(sysdate, 'MON DD HH24:MM:SS') from dual; select group_kffxp as grp, number_kffxp as num, incarn_kffxp as incarn, PXN_KFFXP, XNUM_KFFXP, LXN_KFFXP as lxn, DISK_KFFXP as disk, AU_KFFXP, FLAGS_KFFXP as flg, CHK_KFFXP as chk from x$kffxp; select rpad('>', 10, '>'), to_char(sysdate, 'MON DD HH24:MM:SS') from dual; set linesize 1500 select GROUP_NUMBER, DISK_NUMBER, INCARNATION, MOUNT_STATUS, HEADER_STATUS, MODE_STATUS, STATE, LIBRARY, TOTAL_MB, FREE_MB, NAME, FAILGROUP, LABEL, PATH, CREATE_DATE, MOUNT_DATE, READS, WRITES, READ_ERRS, WRITE_ERRS, READ_TIME, WRITE_TIME, BYTES_READ, BYTES_WRITTEN from v$asm_disk; spool off exit

Advanced Customer Services

ASM Troubleshooting Scenarios


ASM space issues
1. ASM level errors
ORA-15041 ORA-15047

2. RDBMS level errors when storage is on ASM 3. Inconsistencies between what is perceived as the available space 4. Inconsistencies between V$ASM_DISKGROUP and X$ views Note #351117.1 - Information to gather when diagnosing ASM space issues contains scripts for collecting specific ASM information

Advanced Customer Services

ASM Troubleshooting Scenarios

1.

ASM Disk Missing


Use OS utilities to determine which disk cannot be found

TRUSSing or STRACEing the RBAL process while selecting * from v$asm_disk can often show errors in the path of the command
SESSION #1 strace -f -o /tmp/rbal.trc -p <OS pid of RBAL process> <OR> truss -ef -o /tmp/rbal.out -p <OS pid for RBAL process> SESSION #2 select * from v$asm_disk SESSION #3 tail f /tmp/rbal.trc Examine the rbal.out for errors: 1147090: 1871929: chdir("dev/") = 0 1147090: 1871929: statx("rhdisk8, ", 0x0FFFFFFFFFFFAA80, 176, 010) Err#2 ENOENT This says that rhdisk8 cannot be found 2. ORA-15063: ASM discovered an insufficient number of disks for diskgroup s% ORA-15040: diskgroup is incomplete ORA-15042: ASM disk "%" is missing

Note #452770.1- ASM disk not found/visible/discovered issues

Advanced Customer Services

ASM Troubleshooting Scenarios

1.

ASM is Unable to Detect ASMLIB Disks/Devices


First of all, please scan the disks (on all the nodes if RAC): dbaasm.us.oracle.com:+ASM:oracle:11g>/etc/init.d/oracleasm scandisks Scanning system for ASM disks: OK ]

2) Second, make sure the disks can be listed :


dbaasm.us.oracle.com:+ASM:oracle:11g>/etc/init.d/oracleasm listdisks VOL1_10G VOL2_10G 3) Query each disks: dbaasm.us.oracle.com:+ASM:oracle:11g>/etc/init.d/oracleasm querydisk VOL1_10G Disk "VOL1_10G" is a valid ASM disk on device [3, 18] dbaasm.us.oracle.com:+ASM:oracle:11g>/etc/init.d/oracleasm querydisk VOL2_10G Disk "VOL2_10G" is a valid ASM disk on device [3, 22] 4) Check if they exist at OS level: dbaasm.us.oracle.com:+ASM:oracle:11g>ls -l /dev/oracleasm/disks/VOL1_10G brw-rw---- 1 oracle dba 3, 18 Aug 13 09:54 /dev/oracleasm/disks/VOL1_10G dbaasm.us.oracle.com:+ASM:oracle:11g>ls -l /dev/oracleasm/disks/VOL2_10G brw-rw---- 1 oracle dba 3, 22 Aug 13 09:55 /dev/oracleasm/disks/VOL2_10G 5) Then, in the initialization parameter file set the discovery disks string parameter as follow: asm_diskstring =ORCL:* Note: Also, you can set it thru the DBCA (during the diskgroup(s) creation) by pressing the [Change Disk Discovery Path] button.

Advanced Customer Services

ASM Troubleshooting Scenarios

1.

ASM is Unable to Detect ASMLIB Disks/Devices (LINUX Specific)


6) If the problem persists then you can set the discovery disks string as follow: asm_diskstring = /dev/oracleasm/disks/* 7) As workaround you can setasm_diskstring = /dev/oracleasm/disks/*, this is possible for Oracle 10g Release 2 and onwards since it can access block devices. Oracle uses O_DIRECT flag, which can be used for opening block devices to bypass the OS cache.

8) If the problem persists, please open a new service request with Oracle support and then please provide us the next information (from all the nodes if RAC) :
2. 3. 4. Upload the next files: ======================================= =)> /var/log/messages =)> New /etc/sysconfig/oracleasm =)> alert+ASM#.log for each instance. ================================ And the output of the next commands ================================ $> cat /etc/*release $> uname -a $> rpm -qa |grep oracleasm $> df -ha $> ls -l /dev/oracleasm/disks $> powermt display dev=emcpower# (On all the partitions if using PowerPath from EMC) ================================ $> /etc/init.d/oracleasm status $> usr/sbin/oracleasm-discover $> /usr/sbin/oracleasm-discover 'ORCL:*' SQL> show parameter asm

5. 6.

7.

Note #457369.1- ASM is Unable to Detect ASMLIB Disks/Devices Advanced Customer Services

ASM Instance Events


Applicable Event Levels (15xxx)
Level 7 - DEBUG - Trace information for ASM/OSM debugging purposes only Level 6 - NLOOPS - Trace deeply nested loops within a function Level 5 - LOOPS - Trace loops within a function Level 4 - CALLS - Trace function call entry Level 3 - NORMAL - Trace normal paths within a function Level 2 - WARN - Trace warning paths within a function Level 1 - ERROR - Trace error paths within a function Kx 0x0000010 /* Array portion flags */ Kxx 0x0000020 /* Alias-Directory operations */ Kxx 0x0000040 /* Block validation interface */ Kxx 0x0000080 /* metadata cache */ Kxx 0x0000100 /* disk operations */ Kxx 0x0000200 /* file operations */ Kxx 0x0000400 /* disk group operations */ Kxx 0x0000800 /* I/O layer (to ASMLIB or KSFD) */ Kxx 0x0001000 /* node monitor (ie CSS interface) */ Kxx 0x0002000 /* network layer (ie RDBMS-ASM connections) */ Kxx 0x0004000 /* PLSQL package */ Kxx 0x0008000 /* recovery */ Kxx 0x0010000 /* templates */ Kxx 0x0020000 /* SQL execution (processing ASM SQL commands) */ Kxxx 0x0040000 /* ASM DBWR */ Kxxx 0x0080000 /* ASM LGWR */ Kxxx 0x0100000 /* I/O handles mirroring, striping, etc. */

Advanced Customer Services

ASM Instance Tracing


Trace RBAL process
[oracle@rac1 ~]$ ps -ef | grep rbal oracle 7745 1 0 09:24 ? 00:00:02 asm_rbal_+ASM1 oracle 9255 1 0 09:27 ? 00:00:00 ora_rbal_whsed1 oracle 9971 5367 0 11:31 pts/1 00:00:00 grep rbal [oracle@rac1 ~]$ strace -f -o /tmp/rbal.trc -p 7745 Process 7745 attached - interrupt to quit Process 7745 detached more /tmp/rbal.trc 7745 semtimedop(163842, 0xbfb973f4, 1, {2, 350000000}) = -1 EAGAIN (Resource te mporarily unavailable) 7745 gettimeofday({1251917133, 714243}, NULL) = 0 7745 gettimeofday({1251917133, 714337}, NULL) = 0 7745 gettimeofday({1251917133, 714395}, NULL) = 0 7745 getrusage(RUSAGE_SELF, {ru_utime={2, 79683}, ru_stime={1, 81835}, ...}) = 7745 sendmsg(13, {msg_name(16)={sa_family=AF_INET, sin_port=htons(32963), sin_a ddr=inet_addr("192.168.1.162")}, msg_iov(2)=[{"\4\3\2\1\327\263\200\0\0\0\0\0MRO N\0\1\0\0\220\0\0\0\1"..., 68}, {"KSXP\2\0\0\0\1\0\2\0\20\0\0\0\4\0\0\0\0\0\0\0\ 0\0\0\0r"..., 144}], msg_controllen=0, msg_flags=0}, 0) = 212

"buffer busy or rdbms ipc reply events Advanced Customer Services

ASM Rebalancing
Rebalancing is the activity of spreading data amongst disks in an ASM group Happens in the background but can be done manually Internally the balance happens on a file per file basis Only one RBAL process runs per node Rebalance request on the same diskgroup are done serially ASM decides how best to balance load across available disks Uses one of three allocation schemes for selecting disks
1. Placement by file/extent number 2. Random-seeded ordering of all disks in the ASM disk directory 3. Balanced placement over all disks

Advanced Customer Services

ASM Rebalancing

Parallel execution based on rebalance POWER


POWER settings are 1-11 (default 1) Used to throttle overhead during normal operations Rebalance moves 1mb chunks at a time Setting POWER to 0 defers rebalancing to another time

Advanced Customer Services

ASM Rebalancing
Displaying & changing rebalance POWER setting
SQL> show parameter limit NAME TYPE VALUE ------------------------------------ ----------- -----asm_power_limit integer 1

Changing setting
SQL> alter diskgroup dg1 rebalance power 8;

Verifying Change
SQL> select * from v$asm_operation; GROUP_NUMBER OPERA STAT POWER ACTUAL SOFAR EST_WORK EST_RATE ------------ ----- ---- ---------- ---------- ---------- ---------- ---------1 REBAL RUN 8 8 0 407 0

Advanced Customer Services

ASM AU/Extent Management


Allocation Units (AU) at the disk level and Extents at the file level Default AU size is 1mb Default extent size is 1mb Extents are allocated in 1, 4, 16, & 64mb chunks (11g) Extent placement is circular when disks are the same size Cannot be changed without recreating the diskgroup Templates can be created and added to diskgroups

Advanced Customer Services

ASM Performance Considerations


Metadata ONLY is Cached In The ASM Instance ASM Diskgroup Configuration
External Redundancy Normal Redundancy (default) High Redundency

ASM Instance Configuration (large_pool_size)


Resolving ORA-4031

ASM Allocation Unit Size (1mb default) ASM Fine Grained Stripe Size (8x128k Stripes) MAX I/O Size Oracle Block Size

Advanced Customer Services

ASM Default Template


Archivelog Files - Coarse Autobackup - Coarse Controlfile - Fine Grained Datafile - Coarse Flashback data - Fine Grained Online REDO - Fine Grained SPFILE - Coarse Tempfile - Coarse

Coarse 1mb stripe size Fine Grained 8 x 128k stripes

Advanced Customer Services

ASM Templates
Striping Attributes Fine, Coarse Redundancy Attributes
Mirror 2 way High 3 way Unprotected Not mirrored

Advanced Customer Services

ASM Templates
Viewing Template
select * from V$ASM_TEMPLATE;

Altering Template
Alter diskgroup DG modify template NAME attributes (coarse/fine);

Adding Template
Alter diskgroup DG add template NAME attributes (attributes);

Dropping Templates
Alter diskgroup DG drop template NAME;

Advanced Customer Services

ASM Background Processes


ora_asmb_whsed1 - Foregrounds servicing clients commands from client <procname> of database asm_pmon_+ASM1 - Process monitor, same as database asm_vktm_+ASM1 - Process to maintain a fast timer, same as database asm_diag_+ASM1 - Diag process, same as database asm_ping_+ASM1 - Process to measure network latency, same as database asm_psp0_+ASM1 - Process that Starts other Processes, used to startup other backgrounds asm_dia0_+ASM1 - Diag slave process, same as database asm_lmon_+ASM1 - Lock monitor, Same as database asm_lmd0_+ASM1 - Lock monitor diag, Same as database asm_lms0_+ASM1 - Lock monitor slaves, same as database asm_mman_+ASM1 - Autotune SGA process, Same as Database. asm_dbw0_+ASM1 - DB writes, same as database DB writer, but deals with ASM cache asm_lgwr_+ASM1 - Log writer, similar to database, but deals with diskgroups asm_ckpt_+ASM1 - Checkpoint process, Similar to database CKPT asm_smon_+ASM1 - Recovery process, Same as database SMON, but deals with diskgroup recovery asm_rbal_+ASM1 - Background process that is used for diskgroup management asm_gmon_+ASM1 - Group monitor, used for partner and status table, and node membership asm_lck0_+ASM1 - Lock monitor slave, Same as database

Advanced Customer Services

ASM Views (10g & 11G)


<Insert Picture Here>

View
V$ASM_ALIAS V$ASM_CLIENT

Contents
Alias for each disk group mounted by the ASM instance Identifies databases using disk groups managed by the ASM instance. Disks discovered by the ASM instance Disk groups known by the ASM instance File list for each disk group mounted by the ASM instance Long running operations executing in the ASM instance Templates present in each ASM mounted disk group

V$ASM_DISK
V$ASM_DISKGROUP V$ASM_FILE V$ASM_OPERATION
V$ASM_TEMPLATE

Advanced Customer Services

ASMCMD Command Reference


<Insert Picture Here> cd - Changes the current directory to the specified directory. du - Displays the total disk space occupied by ASM files in the specified ASM directory exit - Exits ASMCMD. find - Lists the paths of the specified name (with wildcards) under the specified directory. help - Displays the syntax and description of ASMCMD commands. ls - Lists the contents of an ASM directory, attributes of the sfile, or the names and attributes of all disk groups. lsct - Lists information about current ASM clients. lsdg - Lists all disk groups and their attributes. mkalias - Creates an alias for a system-generated filename. mkdir - Creates ASM directory. pwd - Displays the path of the current ASM directory. m - Deletes the specified ASM files or directories. rmalias - Deletes the specified alias, retaining the file that the alias

Advanced Customer Services

New 11g ASM Commands


<Insert Picture Here> cp - Enables you to copy files between ASM disk groups on local instances and remote instances. lsdsk -ASM can list disk information with or without a running ASM instance. Also useful for system or storage administrators to obtain lists of disks that an ASM instance uses. md_backup and md_restore - These commands enable you to re-create a pre-existing ASM disk group with the same disk path, disk name, failure groups, attributes,templates and alias directory structure. You can use md_backup to back up the disk group environment and use md_restore to re-create the disk group before loading from a database backup. remap - You can remap and recover bad blocks on an ASM disk in normal or high redundancy that have been reported by storage management tools such as disk scrubbers. ASM reads from the good copy of an ASM mirror and rewrites these blocks to an alternate location on disk.

Advanced Customer Services

MySupport ASM References


<Insert Picture Here>

Note: 340417.1 - Data Gathering for Troubleshooting ASM Issues Note: 267982.1 - Automatic Storage Management (ASM) Knowledge Browser Product Page Note:824354.1 - How To Trace ASMCMD on Unix Note:351866.1 - How To Reclaim ASM Disk Space Note:345180.1 - How to duplicate a controlfile when ASM is involved Note:553319.1 - ORA-15036 When Starting An ASM Instance

Advanced Customer Services

Advanced Customer Services

Advanced Customer Services

Das könnte Ihnen auch gefallen