Troubleshoot SDS/SVM issues

SDS/SVM troubleshooting TOI
TOI OBJECTIVES
. Objective 1 . Objective 2 . Objective 3 . Objective 4 . Objective 5 . Objective 6 . Objective 7 . Objective 8 . Objective 9
General Guidelines for Troubleshooting Disksuite How to Disable Disksuite Troubleshooting Metadb replicas Troubleshooting Metadevice errors Troubleshooting Metatool issues Troubleshooting Metadevice Creation Problems Troubleshooting metahs and metaset problems Troubleshooting soft partitions Miscellaneous Disksuite Infodocs/SRDBs
General Guidelines for Troubleshooting DiskSuite

Basic information to gather troubleshooting a DiskSuite Problem:

metastat output metadb output copy of /etc/vfstab and /etc/system copy of /etc/release version of Disksuite output of format output of df -k
How to Disable Disksuite

Problem: Unable to boot from a DiskSuite-controlled system disk Cause: various reasons Resolution: 1. Boot from CD 2. Mount the root partition onto /a. It may be required to run 'fsck' on this partition before it can be mounted. # mount /dev/dsk/c0t0d0s0 /a 3. Edit the /etc/system file, and remove the "rootdev" line shown below: # vi /a/etc/system *rootdev:/pseudo/md@0:0,0,blk Do not comment-out this line - REMOVE IT! 4. In the /etc/vfstab file, replace the lines for the system file system metadevices with their underlying partitions. For example, change lines from: /dev/md/dsk/d0 /dev/md/rdsk/d0 / ufs 1 no to: /dev/dsk/c0t0d0s0 /dev/rdsk/c0t0d0s0 / ufs 1 no ONLY change the lines for root (/) and the file systems which were affected by the actions taken in step 1 above. All other meta-devices, may stay 'as is', in this file. 5. Unmount /a and down the system to ok prompt. 6. Boot to single-user mode. THIS IS VERY IMPORTANT. THE SYSTEM MUST BE BOOTED TO SINGLE USER MODE TO AVOID FILE SYSTEM CORRUPTION.
ok boot -s If the system does not boot to single user mode, it is possible that a mistake was made in the above steps. 7. Enter the root password when prompted. Once in single-user mode, clear the metamirrors and all the sub-mirrors of all the system file systems. For example, to clear the root (/), /usr, and /var metamirrors, identified as d0, d1, and d2, respectively, run the command: # metaclear -f -r d0 d1 d2 This will not only clear the metamirrors but also the submirrors which are part of these mirrors. 7a. List the existing state databases on the system: # metadb -i Example output: flags first blk block count a m p luo 16 8192 /dev/dsk/c1t1d0s7 a p luo 8208 8192 /dev/dsk/c1t1d0s7 a p luo 16400 8192 /dev/dsk/c1t1d0s7 a p luo 16 8192 /dev/dsk/c1t0d0s7 a p luo 8208 8192 /dev/dsk/c1t0d0s7 a p luo 16400 8192 /dev/dsk/c1t0d0s7 7b. Remove the state dbs (the -f switch is only required for removing the last DB). This command needs to be completed for all state databases. # metadb [-f] -d <device> 8. Once the metamirrors are cleared, continue the boot up to multi-user mode, by issuing either a CTRL-D or entering: # exit 9. Now everything should be as it was, except that the system partitions are on the underlying partitions, and are not mirrored. Simply re-create the replicas and metadevices for the root mirror, as had been done originally. Additional Info: For rebuilding the /etc/path_to_inst file, you have to follow the same procedure except for the metadb deletion and creation
Troubleshooting Metadb replicas

Problem: State database corrupted or unavailable. Recovering from Stale State Database Replicas Cause: Disk failure , Disk I/O error. Symptom: Error message at the booting time if databases are <= 50% of total database. System comes to Single user mode. ok boot Hostname: host1 metainit: Host1: stale databases Insufficient metadevice database replicas located. Use metadb to delete databases which are broken. Ignore any "Read-only file system" error messages. Reboot the system when finished to reload the metadevice database. After reboot, repair any broken database replicas which were deleted. Type Ctrl-d to proceed with normal startup, (or give root password for system maintenance): <root-password> Entering System Maintenance Mode. 1. Use the metadb command to look at the metadevice state database and see which state database replicas are not available. Marked by unknown and M flag. # /usr/opt/SUNWmd/metadb -i flags first blk block count a m p lu 16 1034 /dev/dsk/c0t3d0s3 a p l 1050 1034 /dev/dsk/c0t3d0s3 M p unknown unknown /dev/dsk/c1t2d0s3 M p unknown unknown /dev/dsk/c1t2d0s3 2. Delete the state database replicas on the bad disk using the -d option to the metadb (1M) command. At this point, the root (/) file system is read-only. You can ignore the mddb.cf error messages: # /usr/opt/SUNWmd/metadb -d -f c1t2d0s3 metadb: demo: /etc/opt/SUNWmd/mddb.cf.new: Read-only file system . Verify deletion # /usr/opt/SUNWmd/metadb -i flags first blk block count a m p lu 16 1034 /dev/dsk/c0t3d0s3 a p l 1050 1034 /dev/dsk/c0t3d0s3 3. Reboot.
4. Use the metadb command to add back the state database replicas and to see that the state database replicas are correct. # /usr/opt/SUNWmd/metadb -a -c 2 c1t2d0s3 # /usr/opt/SUNWmd/metadb flags first blk block count a m p luo 16 1034 dev/dsk/c0t3d0s3 a p luo 1050 1034 dev/dsk/c0t3d0s3 a u 16 1034 dev/dsk/c1t2d0s3 a u 1050 1034 dev/dsk/c1t2d0s3
Problem: metadb -i reports a block count of "unknown" after a reboot. Cause: An SVM state database replica is put on a disk of a different type than what the OS and SDS packages were loaded onto. Symptom: metadb -i flags first blk block count a m p luo 16 8192 /dev/dsk/c0t0d0s7 a p luo 8208 8192 /dev/dsk/c0t0d0s7 a p luo 16400 8192 /dev/dsk/c0t0d0s7 M p 16 unknown /dev/dsk/c1t1d0s7 M p 8208 unknown /dev/dsk/c1t1d0s7 M p 16400 unknown /dev/dsk/c1t1d0s7 Resolution: Make sure the the /etc/system file contains an entry for every disk driver that is needed for existing disks on the system. In this particular case, the system had the OS on IDE disks (drv/dad), and the user attempted to place a replica on an attached SCSI disk (drv/sd) without putting "forceload: drv/sd" in /etc/system.
Problem: Fixing database replicas Cause: various reasons symptoms: case I The state database replicas are found in more than one disk # metadb -i flags first blk block count a m p luo 16 1034 /dev/dsk/c0t0d0s7 W p l 1050 1034 /dev/dsk/c0t0d0s7 aW p luo 2084 1034 /dev/dsk/c0t0d0s7 a p luo 16 1034 /dev/dsk/c0t1d0s7 a p luo 1050 1034 /dev/dsk/c0t1d0s7 aW p luo 2084 1034 /dev/dsk/c0t1d0s7 o - replica active prior to last mddb configuration change u - replica is up to date l - locator for this replica was read successfully c - replica's location was in /etc/opt/SUNWmd/mddb.cf p - replica's location was patched in kernel m - replica is master, this is replica selected as input W - replica has device write errors a - replica is active, commits are occurring to this replica M - replica had problem with master blocks D - replica had problem with data blocks F - replica had format problems S - replica is too small to hold current data base R - replica had device read errors case II State database replicas are found in just one disk # metadb -i flags first blk block count a m p luo 16 1034 /dev/dsk/c0t0d0s7 W p l 1050 1034 /dev/dsk/c0t0d0s7 aW p luo 2084 1034 /dev/dsk/c0t0d0s7 o - replica active prior to last mddb configuration change u - replica is up to date l - locator for this replica was read successfully c - replica's location was in /etc/opt/SUNWmd/mddb.cf p - replica's location was patched in kernel m - replica is master, this is replica selected as input W - replica has device write errors a - replica is active, commits are occurring to this replica M - replica had problem with master blocks D - replica had problem with data blocks F - replica had format problems S - replica is too small to hold current data base R - replica had device read errors
Resolution: Case I: State database replicas are found in more than one disk Here, we need to fix the replicas on a per disk basis. There needs to be a surviving state database replicas to work from. # metadb -d c0t0d0s7 # metadb -a -c3 c0t0d0s7 # metadb -i (check to see if the replicas are fixed) (if fixed, proceed fixing the next set) # metadb -d c0t1d0s7 # metadb -a -c3 c0t1d0s7 Case II. State database replicas are found in just one disk In this case, we can't just delete the replicas found in just one disk. You will need to create some replicas in another disk before proceeding to fix these bad replicas. In this example, let us assume c0t1d0s7 is the slice we can create new state database replicas. # metadb -a -c3 c0t1d0s7 # metadb -i (check to see if the replicas were created) # metadb -d c0t0d0s7 # metadb -a -c3 c0t0d0s7
Troubleshooting Metadevice Errors

Problem: Sub Mirrors out of sync in "Needs maintenance" state Cause: Disk problem / failure , improper shutdown , communication problems between two mirrored disks . Symptom: "Needs maintenance" errors in metastat output # /usr/opt/SUNWmd/metastat d0: Mirror Submirror 0: d10 State: Needs maintenance Submirror 1: d20 State: Okay .... d10: Submirror of d0 State: Needs maintenance Invoke: "metareplace d0 /dev/dsk/c0t3d0s0 <new device>" Size: 47628 blocks Stripe 0: Device Start Block Dbase State Hot Spare /dev/dsk/c0t3d0s0 0 No Maintenance d20: Submirror of d0 State: Okay Size: 47628 blocks Stripe 0: Device Start Block Dbase State Hot Spare /dev/dsk/c0t2d0s0 0 No Okay Solution: 1. If disk is all right - enable the failed metadevice with metareplace command . If disk is failed - Replace disk create similar partitions as in failed disk and enable new device with metareplace command. # metareplace -e d0 c0t3d0s0 Device /dev/dsk/c0t3d0s0 is enabled 2. If disk has failed and you want to move the failed devices to new disk with different id (cXtXdX) - add new disk , format to create a similar partition scheme as in failed disk and use metarepalce command # metareplace d0 c0t3d0s0 <new device name> The metareplace command above can also be used for concat or strip replacement in a volume but that would involve restoring the backup if it is not mirrored.
Problem: 'metastat' shows all metadevices as "needs maintenance" after a reboot Cause: This may be caused by the "metasync -r" command not getting executed when the system boots, or if the system boots up only to single-user mode. Symptom: On a system running DiskSuite, all mirrors show a state of "Needs maintenance" after rebooting the system. This occurs on all mirrors on the system, including the root disk, if that, too, is under DiskSuite control. The mirrors are able to be turned into an "Okay" state by using the metaclear and metainit commands. However, when the system is rebooted, the mirrors return to a state of "Needs maintenance." An example of this condition is seen by the metastat command: # metastat d0: Mirror Submirror 0: d10 State: Needs maintenance Submirror 1: d20 State: Needs maintenance Pass: 1 Read option: roundrobin (default) Write option: parallel (default) Size: 6298425 blocks d10: Submirror of d0 State: Needs maintenance Size: 6298425 blocks Stripe 0: Device Start Block Dbase State Hot Spare c0t0d0s0 0 No Okay d20: Submirror of d0 State: Needs maintenance Size: 6298614 blocks Stripe 0: Device Start Block Dbase State Hot Spare c1t1d0s0 0 No Okay Resolution: For Solstice DiskSuite versions between 3.x and 4.2, inclusive, the metasync command is located in the /etc/rc2.d/S95SUNWmd.sync file. For DiskSuite version 4.2.1 and above, the metasync command is located in the file / etc/rc2.d/S95lvm.sync. In all cases, because this script is not run until the system transitions into run state 3 (multi-user mode), it is to be expected to have both submirrors in a "Needs maintenance" state until the command is run. I/O to these metadevices works just fine while in this state, so there is no need to worry.
Problem: Disksuite & Solaris Volume Manager Metadevices on Ultra3 attached disks are put in 'Maintenance' state at boot time Cause: The PCI Dual Ultra3 SCSI Host Adapter uses a new driver, known as the qus driver. (See Sunsolve doc: 49060) In order for Disksuite or Volume Manager to access devices using this driver at boot time it is necessary to forceload the drivers for this card in the system file. Note that they should be placed before the forceloads for Disksuite or Solaris Volume Manager. Systems that are capable of using the PCI Dual Ultra3 SCSI Host Adapter card: Netra t [TM] 1400/1405 Sun Fire [TM] V480 Netra [TM] 20 Sun Fire [TM] V880/V880z Sun Blade [TM] 1000 workstation Sun Fire [TM] V1280 Sun Blade [TM] 2000 workstation Sun Fire [TM] E2900 Sun Fire [TM] 280R Sun Fire [TM] E4900 Sun Enterprise [TM] 220R server Sun Fire [TM] 4800/4810 Sun Enterprise [TM] 250 server Sun Fire [TM] 6800/6900 Sun Enterprise [TM] 420R server Sun Fire [TM] 12K/15K Sun Enterprise [TM] 450 server Ultra[TM] 60 workstation Sun Fire [TM] V120 Ultra[TM] 80 workstation Storage Devices that are capable of using the PCI Dual Ultra3 SCSI Host Adapter card: Sun StorEdge[TM] 3120 Sun StorEdge[TM] 3310 Sun StorEdge[TM] D2 Sun StorEdge[TM] D240 Sun StorEdge[TM] S1 Resolution: The PCI Dual Ultra3 SCSI Host Adapter uses a new driver, known as the qus driver. (See Sunsolve doc: 49060) In order for Disksuite or Volume Manager to access devices using this driver at boot time it is necessary to forceload the drivers for this card in the system file. Note that they should be placed before the forceloads for Disksuite or Solaris Volume Manager. So, the resolution is to add the following lines : forceload: drv/sd forceload: drv/qus to /etc/system and they must be added before the line which reads : * Begin MDD root info (do not edit) Until the system file can be edited the metadevices can just be resync'ed after booting by using metareplace Additional Information: Similar problems will occur if the Disksuite or Solaris Volume Manager state database replicas are placed on the external storage. They will be marked stale at boot time until the forceloads are put into the system file.
Problem: One sub-mirror shows "resync", the other shows "last erred" Cause: This condition resulted from forcing metasync to sync the good sub-mirror up to the "Last Erred" sub-mirror, and persisted through multiple metasync commands and system shutdowns. Symptom: The metastat command prints: d2: Mirror Submirror 0: d0 State: Needs maintenance Submirror 1: d1 State: Needs maintenance d0: Submirror of d2 State: Needs maintenance Invoke: metasync d2 Device Start Block Dbase State Stripe 0: c0t2d0s6 0 No Okay Stripe 1: c2t0d0s6 0 No Okay Stripe 2: c2t2d0s6 0 No Resyncing d1: Submirror of d2 State: Needs maintenance Invoke: after replacing "Maintenance" components: metareplace d2 c2t3d0s6 <new device> Device Start Block Dbase State Stripe 0: c0t3d0s6 0 No Okay Stripe 1: c2t1d0s6 0 No Okay Stripe 2: c2t3d0s6 0 No Last Erred Resolution: This creates a circular dependency in the sub-mirrors. The "d0" submirror cannot sync up to the bad "d1" sub-mirror and the "d1 sub-mirror cannot be fixed until "d0" is out of "Resyncing" and the master, "d2", is back into an "Okay" state. In order to break this dependancy, one of the sub-mirror meta-devices must be removed. 1. In the above example, the "d1" sub-mirror is known to contain a bad disk, so it is removed. metadetach d1 metasync d2 This brings the "d2" mirror into an "Okay" state where it can be checked for damage and put back into production. 2. The errored sub-mirror then needs to be fixed. If it is necessary to check the data on the sub-mirror, a new mirror can be created with "d1" as the only sub-mirror. This may or may not remove the "Last Erred" state. metainit d20 -m d1
3. At this point, the physical disk can be replaced and then brought into the sub-mirror. Assuming that the new disk has the same physical cXtXdX number then: metareplace -e d1 c2t3d0s6 Otherwise: metareplace d1 c2t3d0s6 c4t2d0s6 (Where c4t2d0s6 is the new disk) 4. If a mirror was created in step #3, it needs to be destroyed while still retaining the sub-mirror. metaclear -f d20 5. The fixed sub-mirror can now be brought back into the original mirror. metattach d2 d1 A sync is automatically started and will bring the mirror into the "Okay" state when it has finished. Problem: How to put submirrors in an OK state when having two needs maintenance state and I/O is still going to metamirror Cause: various reasons Symptom: d4: metamirror Submirror 0: d3 State: needs maintenance Submirror 1: d5 State: needs maintenance Regions which are dirty: 1% Pass = 1 Read option = round-robin (default) Write option = parallel (default) Size: 52560 blocks d3: Submirror of d4 State: Okay Size: 52560 blocks Stripe 0: Device Start Block Dbase /dev/dsk/c0t3d0s0 0 No d5: Submirror of d4 State: Okay Size: 156240 blocks Stripe 0: Device Start Block Dbase /dev/dsk/c1t3d0s0 0 No
State Hot Spare Okay
State Hot Spare Okay
Resolution: To find the good submirror, try detaching a submirror: metadetach d4 d3 If this fails with a message like: operation... it results in no readable submirrors, then you know this is the good submirror. You have two choices: 1. metadetach d4 d5 ( detach bad submirror ) metattach d4 d5 ( reattach so status now says OK. Wait for resync to finish ) metadetach d4 d3 metattach d4 d3 ( These two operations will put status back to OK ) 2. metareplace -e d4 /dev/dsk/c1t3d0s0 ( Wait for resync to finish ) metareplace -e d4 /dev/dsk/c0t3d0s0
Problem: 'Needs Maintenance' and "Last Erred" states. Cause: When a slice in a mirror or RAID5 metadevice device experiences errors, DiskSuite puts the slice in the "Maintenance" state. No further reads or writes are performed to a slice in the "Maintenance" state. Subsequent errors on other slices in the same metadevice are handled differently, depending on the type of the metadevice. A mirror may be able to tolerate many slices in the "Maintenance" state and still be read from and written to. A RAID5 metadevice, by definition, can only tolerate a single slice in the "Maintenance" state. When either a mirror or RAID5 metadevice has a slice in the "Last Erred" state, I/O is still attempted to the slice marked "Last Erred". This is because a "Last Erred" slice contains the last good copy of data from DiskSuite's point of view. With a slice in the "Last Erred" state, metadevice behaves like a normal device (disk) and returns I/O errors to an application. Usually, at this point some data has been lost.
Symptom: Metastat gives : d0: Mirror Submirror 0: d10 State: Needs maintenance Submirror 1: d20 State: Needs maintenance Pass: 1 Read option: roundrobin (default) Write option: parallel (default) Size: 12585752 blocks d10: Submirror of d0 State: Needs maintenance Invoke: metareplace d0 c0t0d0s0 <new device> Size: 12585752 blocks Stripe 0: Device Start Block Dbase State Hot Spare c0t0d0s0 0 No Maintenance d20: Submirror of d0 State: Needs maintenance Invoke: after replacing "Maintenance" components: metareplace d0 c0t1d0s0 <new device> Size: 12585752 blocks Stripe 0: Device Start Block Dbase State Hot Spare c0t1d0s0 0 No Last Erred Resolution: First replace "Maintenance" slice : # metareplace -e d0 c0t0d0s0 - Wait for "Okay" status for the corresponding submirror, - Then replace "Last erred" slice # metareplace -e d0 c0t1d0s0 1. Mirrors If slices are in the "Maintenance" state, no data has been lost. You can safely replace or enable the slices in any order. If a slice is in the "Last Erred" state, you cannot replace it until you first replace all the other mirrored slices in the "Maintenance" state. Replacing or enabling a slice in the "Last Erred" state usually means that some data has been lost. Be sure to validate the data on the mirror after repairing it. 2. RAID5 metadevices A RAID5 metadevice can tolerate a single slice failure. You can safely replace a single slice in the "Maintenance" state without losing data. If an
error on another slice occurs, it is put into the "Last Erred" state. At this point, the RAID5 metadevice is a read-only device; you need to perform some type of error recovery so that the state of the RAID5 metadevice is non-errored and the possibility of data loss is reduced. If a RAID5 metadevice reaches a "Last Erred" state, there is a good chance it has lost data. Be sure to validate the data on the RAID5 metadevice after repairing it.
Troubleshooting Metatool issues

Problem: Metatool start up error "RPC: Program not registered" Cause: rpc.metad and rpc.metamhd lines are removed or commented out in the / etc/inetd.conf file for security reasons Resolution: For DiskSuite 4.0 through 4.2: # rpc.metad 100229/1 tli rpc/circuit_n wait root /usr/opt/SUNWmd/sbin/rpc.metad rpc.metad # rpc.metamhd 100230/1 tli rpc/circuit_n wait root / usr/opt/SUNWmd/sbin/rpc.metamhd rpc.metamhd For DiskSuite 4.2.1 and above: # rpc.metad 100229/1 tli rpc/tcp wait root /usr/sbin/rpc.metad rpc.metad # rpc.metamhd 100230/1 tli rpc/tcp wait root /usr/sbin/rpc.metamhd rpc.metamhd These lines normally fall at the bottom of the '/etc/inetd.conf' file. If these lines are not there, insert the appropriate lines into the file then run the following commands. First find the process ID of the inetd process. # ps -ef | grep inetd Then, send a '-1' signal to the daemon which will cause it to re-read the configuration file which you just edited: # kill -1 <pid> Now you can attempt to run 'metatool'.
Troubleshooting Metadevice Creation Problems

Problem: metainit, metastat return incorrect device id DiskSuite is used to create metadevices from the command line. In this particular case, the root disk file systems, (/ , /usr, /var, /opt and swap) are being mirrored. Metadevice creation of the initial boot disk slices yielded unexpected outcomes. The following two problems were observed. Cause: This problem is caused by an incorrect entry in the /etc/vfstab file. Symptom: 1. Force creation of the metadevice for /usr. metainit -f d0 1 1 c0t0d0s6 Error: c0t1d0s6 is busy 2. Force creation of the metadevice for /var metainit -f d3 1 1 c0t0d0s3 d3: Concat/Stripe is setup metastat d3 d29: Concat/Stripe Size: 5121360 blocks Stripe 0: Device Start Block Dbase c0t0d0s4 0 No It's the device names which identify a problem for us. In the first example, we created a metadevice from c0t0d0s6. However, DiskSuite told us c0t1d0s6 was busy. The "t" or target value reported back to us was the incorrect value. The second example is similar. We successfully created a metadevice from c0t0d0s3. However, a subsequent metastat stated that the device was created from c0t0d0s4. The "s" or slice value reported back to us was the incorrect value. Resolution: This problem is caused by an incorrect entry in the /etc/vfstab file. The entries for /usr and /var had the WRONG raw device identified. The file looked like this. /dev/dsk/c0t0d0s6 /dev/rdsk/c0t1d0s6 /usr ufs 1 no /dev/dsk/c0t0d0s3 /dev/rdsk/c0t0d0s4 /var ufs 1 no /usr contains the wrong target value for the raw device. It is t1, it must be t0. /var contains the wrong value for the slice. It is s4, it must be s3. Correcting this typo in the /etc/vfstab file solves the problem
Problem: Metaroot generating errors in relation to the /etc/system file: "metaroot: testhost: /etc/system: error in system file" Cause: error in the /etc/system file Symptom: Running metaroot command generates error: metaroot: testhost: /etc/system: error in system file Resolution: /etc/system file looks as follows: * Begin MDD root info (do not edit) forceload: misc/md_trans forceload: misc/md_raid forceload: misc/md_hotspares forceload: misc/md_sp forceload: misc/md_stripe forceload: misc/md_mirror forceload: drv/pcisch f orceload: drv/glm forceload: drv/sd * End Even though the beginning and end lines of the Solstice DiskSuite[TM] entries have the * for a comment line, they are still being read and must read as: * Begin MDD root info (do not edit) and * End MDD root info (do not edit) In this case, you would modify the last line to: * End MDD root info (do not edit) Following the correction, metaroot runs successfully. You may also find that one of these lines may be missing. Inserting the line will correct the issue. Problem: metainit "No such file or directory" Cause: Not enough metadevices are available Symptom: metainit d154 1 1 c0t0d0s0 Error: metainit: amadv44: d154: No such file or directory Resolution: The number of available metadevices must be increased. To increase the number of available metadevices cd /kernel/drv vi md.conf Change nmd=128 to nmd=256 This will cause an addition of 128 metadevices to be made available for a total of 256 Problem: meta(command) waiting on /etc/opt/SUNWmd/lock error Cause: Message means either two people are typing commands from the command line or that the GUI (metatool) is up and there are "uncommitted" actions, that is a
stripe/concat being made, a mirror being built while commands are being typed in from command line. Symptom: While typing Online: DiskSuite[TM] commands. metainit, metareplace, metadb etc... from the command line, you get the message "meta(command) waiting on /etc/opt/SUNWmd/lock" Resolution: To clear the error, type "commit" from within the GUI to commit all actions, or close metatool,or verify that no other user is typing meta commands to setup DiskSuite.
Troubleshooting metahs and metaset problems

Problem: How to repair a hot spare in "Broken' state. Cause: Hot spares are placed in the "Broken" state after an I/O error occurs Symptom: # metastat hsp000: 1 hot spare c0t5d0s2 Available 8380800 blocks hsp001: 1 hot spare c2t5d0s2 Broken 8380800 blocks Resolution: Hot spares are placed in the "Broken" state after an I/O error occurs. Once the affected hardware has been replaced, the component can be brought back to the "Available" state using the following 'metahs' command: # metahs -e /dev/dsk/c2t5d0s2 # metastat hsp001: 1 hot spare c2t5d0s2 Available 8380800 blocks This can also be achieved via the Solaris[TM] Management Console (SMC) GUI by doing the following: In the Enhanced Storage tool within the Solaris Management Console, open the Hot Spare Pools node and select a hot spare pool. Choose Action-->Properties, then the Hot Spares panel and click on the hot spares then choose the one that is Broken and click on enable hotspare Problem: metaset Command Fails with the Message-metaset: hostname rpc.metad: permission denied Cause: Root is not part of sysadmin group Symptom: When running the metaset command to create a metaset, it fails with the following message: metaset: Lone-Ranger: rpc.metad: Permission Denied For example, the following command line is run: Lone-Ranger# metaset -s High-o-silver -a -h Tanto Lone-Ranger
Where the syntax is as follows: metaset -s <metaset_name> -a -h <;hostname.1> <hostname.2> Fails with: metaset: <hostname>;: rpc.metad: Permission denied Resolution: Add root to the sysadmin group.
Troubleshooting soft partitions

Problem: Can't mount soft partitions after reboot Cause: forceload line for md_sp driver on /etc/system file is commented out Symptom: An SVM (Solaris Volume Manager) soft partition can be created, but when the box is rebooted, one can't mount SVM (Solaris Volume Manager) soft partitions. The following error message is received: mount: No such device mount: Cannot open /dev/md/dsk/dXX Resolution: Uncomment the forceload line for md_sp driver on /etc/system file * Begin MDD root info (do not edit) *forceload: misc/md_trans *forceload: misc/md_raid *forceload: misc/md_hotspares *forceload: misc/md_sp << -------Uncomment this line !!! forceload: misc/md_stripe forceload: misc/md_mirror forceload: drv/pcisch forceload: drv/qlc forceload: drv/fp forceload: drv/ssd rootdev:/pseudo/md@0:0,0,blk * End MDD root info (do not edit) Additional Information: During the soft partition creation process no problem occur as the SVM (Solaris Volume Manager) commands modload the "md_sp" driver during the execution, so this problem will only be noticed when trying to boot the system with "md_sp" driver comment out on /etc/system. Problem: Resolving Non-Mirrored Soft Partitions in An Errored State Symptom: In the example, a single Sun StorEdge[TM] array is connected to a server that is configured with one logical unit number (LUN), c2t0d0s0, sliced by several soft partitions. This SCSI warning appears: scsi: [ID 107833 kern.warning] WARNING: /pci@8,600000/scsi@1/sd@0,0 (sd30): SCSI transport failed: reason 'unexpected_bus_free': retrying command scsi: [ID 107833 kern.warning] WARNING: /pci@8,600000/scsi@1/sd@0,0 (sd30): SCSI transport failed: reason 'unexpected_bus_free': giving up rdriver: [ID 486355 kern.notice] ID[RAIDarray.rdriver.4003] The Array driver is returning an Errored I/O, with errno 5, on cl201_003, Lun 0, sector 55301828
The previous errors indicate that a soft partition is in a wrong state: md_sp: [ID 944615 kern.warning] WARNING: md: d84: open failed, soft partition status is not OK (status = 4). md_sp: [ID 944615 kern.warning] WARNING: md: d84: open failed, soft partition status is not OK (status = 4). You can verify the problem with the metastat output: d80: Concat/Stripe Size: 141164544 blocks Stripe 0: Device Start Block Dbase State Hot Spare c2t0d0s0 0 No Okay d84: Soft Partition Component: d80 State: Errored Size: 12288000 blocks Extent Start Block Block count 0 55300100 12288000 When a non-mirrored, soft partition is in an errored state, it is impossible to access the file system: fsck, ls, mount : I/O error or no such device. Resolution: To fix the problem described in the Problem Statement, simply use the metarecover command with the "-p" and "-m" options: # metarecover d80 -p -m answer 'yes' to the question This metarecover command reads the information from the replica regarding each soft partition on that slice and lays down an extent header for each soft partition at the correct location in that slice. The man page describes the attributes used with the metarecover command: -p Regenerates soft partitions based on the metadevice state database or extent headers on the underlying device. If neither -d nor -m are specified, this option compares the soft partition information in the metadevice state database to the extent headers. -m Regenerates the extent headers and re-applies them to the underlying device based on the soft partitions listed in the metadevice state database. Options -d and -m are mutually exclusive. After the metarecover command is run, I/O can once again take place to the soft partition.
Miscellaneous Disksuite Infodocs/SRDBs ============================================================ ID11617 Solstice DiskSuite[TM] 4.X - System panics on reboot after mirroring root after upgrade Document on the failure to boot after running metaroot command for 4.x install and configuration. Shows how the following panic can be caused by remnants of older SDS and shows how to clear this up. Here is the panic seen:
cannot assemble drivers for /pseudo/md@0:0,blk cannot mount root on /pseudo/md@0:0,blk panic vfs_mountroot: cannot mount root
============================================================ ID70710 Solstice Backup[tm]: How to recover Operating System filesystems on a client with Solstice Disksuite[tm] mirrored rootdisk If backups of SDS mirrored rootdisk are backed up by Solstice Backup, use this document to restore of this backed up OS. ID16330 VxVM/SDS - mount error with a 'boot -a' with mirrored root disk Simple document describing what path you should use during a boot -a repair boot when it asks for the physical name of root device. Don't use default, but use the path from the /etc/system file's rootdev entry. ============================================================ SRDB 75210 Solaris[TM] Volume Manager Software and Solstice DiskSuite[TM] Software: Mounting Metadevices Document which shows how to boot to Solaris 9 CD and mount metadevices (both S9 and pre-S9 metadevices). It also mentions the use of the Diagnostic Boot CD, which can be used to mount both SDS and VxVM devices. Info on this CD at: http://saturn.east/dbcd/2.8/index.html also: ID79697 Mount sds mirrors when booted from cdrom in Solaris[TM] 9 Similar document has example:
# # # # mount -o ro /dev/dsk/c0t0d0s0 /a cd /tmp cp /a/kernel/drv/md.conf /tmp/root/kernel/drv umount /a
The problem now is that the md kernel module is already loaded and won't re-read md.conf, so find the module's id, and unload it:
# modinfo | grep md # modunload -i <md_mod_id> # metastat
Should have access to volumes at this point.
ID75540 How to tell which disk was used to boot when the boot disk is mirrored Document for determining what device booted to (primary or mirror and what is the physical path). Example: # prtconf -pv|grep bootpath: bootpath: '/sbus@3,0/SUNW,fas@3,8800000/sd@1,0:a' # eeprom |grep alias ===> compare to bootpath ============================================================

Troubleshoot SDS/SVM issues

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Troubleshoot SDS/SVM issues

Hochgeladen von

Copyright:

Verfügbare Formate

SDS/SVM troubleshooting TOI

. Objective 1 . Objective 2 . Objective 3 . Objective 4 . Objective 5 . Objective 6 . Objective 7 . Objective 8 . Objective 9

General Guidelines for Troubleshooting DiskSuite

How to Disable Disksuite

Troubleshooting Metadb replicas

Troubleshooting Metadevice Errors

State Hot Spare Okay

State Hot Spare Okay

Troubleshooting Metatool issues

Troubleshooting Metadevice Creation Problems

Troubleshooting metahs and metaset problems

Troubleshooting soft partitions

Should have access to volumes at this point.

Das könnte Ihnen auch gefallen