Slide 1: Oracle

Slide 1
Oracle
1 Copyright 2011, Oracle and/or its affiliates. All rights reserved.

Proprietary and Confidential
2008
Oracle Corporation Proprietary and Confidential
Slide 2
Sun x86 Servers Troubleshooting Tools and

FRU/CRU Replacement Course
Copyright 2011, Oracle and/or its affiliates. All rights reserved.
Welcome to the x86 Servers Troubleshooting tools and FRU/CRU Replacement course.
Slide 3
Course Objectives
By the end of this course, you should be able to:
Interpret system indicators
Locate data gathering tools used to troubleshoot x86
system issues
Locate and describe diagnostic troubleshooting tools
supported on x86 systems

Describe FRU/CRU replacement procedures
By the end of this course, you should be able to interpret system indicators, locate and describe
tools used to gather troubleshooting data, locate and describe diagnostic tools, and describe
FRU/CRU replacement procedures for x86 systems.
Slide 4
Before You Begin

You have the option to Test Out of this course
at any time.
So, lets get started
So, lets get started
As a reminder, you have the option to Test Out of this course at any time. So, lets get started
Slide 5
Troubleshooting Tools
System Indicators (LEDs)

Data Gathering Troubleshooting Tools
Diagnostic Troubleshooting Tools
Storage Diagnostic Tools
There are many ways to gather data on x86 servers for troubleshooting purposes. In this course we will
discuss a few of these methods including system indicators, data gathering tools, as well as system and
storage diagnostic tools. Well start by looking at the x86 system indicators.
Slide 6
Interpreting System Indicators

Power LED (green):
OFF = Server not powered

BLINK = Standby Power Mode
Fast BLINK = Service Processor initializing
Slow BLINK = Server Host is initializing
ON = OS Booted
Service Action Required LED (amber):
ON = Hardware failure
OK-to-Remove LED (blue):
ON = Component ready for removal

6
System indicators, or LEDs, are a good place to start when troubleshooting a hardware issue. The
x86 servers typically have 3 system LEDs to indicate the state of the platform. The Power LED
is a green LED. There are three states for the green Power LED. When the LED is off, the
server is not connected to AC power. When the LED is blinking, the server is in Standby Power
mode. This means that the server is connected to AC power, but the host is not powered.
Depending on the server, you may have two blinking states, the fast blink and the slow blink
states. The fast blink lasts for two minutes after the AC power is applied to the host, during the
time that the SP is initializing. The slow blink occurs when power is applied to the server host,
during the time that the server host is booting. To determine which blink states are supported by
a specific server, refer to that servers service manual. When the Power LED is steady on, this
indicates that the server is connected to AC power and its host is powered. The Service Action
Required is an amber LED. This LED turns steady on when any hardware failure occurs within
the server or to any of its components. A corresponding component Service Action Required
LED may light up on the specific server component that has failed. The OK-to-Remove LED is
blue. This LED turns steady on to indicate when it is safe to remove a failed component for
replacement.
Slide 7
Interpreting Locator, CPU, and DIMM Indicators

Locator LED (white):
Programmed to flash to assist onsite technicians find the server
CPU and DIMM LEDs (amber):

Push the Fault Remind button to light the component fault LED
Fault Remind Button
Fault Remind Button
White locator LEDs maybe on some server models. These can be programmed to flash to assist
onsite technicians in finding a specific server among a number of servers.
Amber LEDs are found on certain components internal to the server. Each CPU and each DIMM
has its own amber LED. To provide capacitor power to the CPU and DIMM fault LEDs, press
the Fault Remind button. The LED with the faulty component will then light.
Since the amber component LEDs are powered by a capacitor, they will only have enough power
to light up a couple of times, and only for a short time. Due to this limitation, be prepared to
locate the faulty component quickly, as you may only get results from pressing the Fault Remind
button a couple of times. This action needs to be completed within a set time period, determined
by each specific server.
Slide 8
Interpreting Disk and Ethernet Indicators

Disk LEDs:
OK Power (green)
Service Action Required (amber)
Ready-to-Remove (blue)
Ethernet Port LEDs
Link/Activity (green)
Speed (green [1-Gbit/sec])

(amber [100-Mbit/sec])
(off [10-Mbit/sec])
Hard disk drives have three LEDs. The green OK Power LED will light when the disk is
powered and will flash according the disks activity. The amber Service Action Required LED,
again, is the hardware fault indicator. The blue Ready-to-Remove LED is steady on when the
disk is ready to be removed.
Ethernet ports have two green LEDs. The Link/Activity LED is located on the left of the port and
it flashes according to port activity. The Speed LED is located on the right of the port and its
color determines the speed the port is configured for. Green indicates 1 gigabit per second,
Amber indicates 100 megabit per second, and Off indicates a 10 megabit per second speed
configuration.
Slide 9
Interpreting Power Supply and Fan Indicators

Power Supply LEDs:
AC (green)
DC (green)
Ready-to-Remove (blue) [optional]
Fan LED:
OK Power (green)
x86 Platform power supplies support two or three LEDs. These LEDs may include: a green AC
LED, a green DC LED, an amber Service Action Required LED and/or a blue Ready-to-Remove
LED. A lighted AC LED indicates that AC power is present at the power supply, while the DC
LED indicates that the power supply is generating DC power when it is on.
The fan modules support one or two LEDs. A lighted green OK power LED indicates that the fan
module is powered. The amber Service Action Required LED will light when the fan module
encounters a hardware failure.
Now that we have a good understanding of the platform and component level LEDs, lets look at
some other data gathering tools that can be used to troubleshoot x86 issues.
Slide 10
Data Gathering Troubleshooting Tools

Tools
Resident on
Description
System Event Logs
ILOM, BIOS, IPMI, Solaris,

Linux and Windows
Logs of system events categorized along with a

timestamp, severity and error message.
faultmgmt and ipmitool
ILOM and IPMI
Displays sensor and indicator information
prtdiag, prtconf, sysconfig
Solaris
Displays system configuration
ifconfig, netstat
Solaris and Linux
Displays and configures network configuration
ipconfig, netstat
Windows
Displays and configures the network

configuration
snapshot
ILOM
This utility enables you to produce a snapshot

of the server SP at any instant in time.
Explorer
Solaris
A Solaris specific data gathering tool for

collection of the more advanced feature
information and its current status.
MPS Report
Windows
Microsoft Product Support Reports (Data

Collection Tool)
http://www.microsoft.com/download/en/details.
aspx?id=24745
sosreport
RedHat, Oracle Enterprise

Linux (OEL)
Data Collection Tool (built into OS)
supportconfig
SuSE
Data Collection Tool (built into OS)
Press PLAY (4) to Continue

10
In this next section of the course we will discuss data gathering tools to use for troubleshooting.
The x86 Platform data gathering troubleshooting tools allow the user to view system status and
configuration. The tools to view system status are listed in the table on the slide, along with
where they reside and their function. We will look at each of these tools in more detail on the
next few slides.
Slide 11
OS System Event Logs

A system error may be displayed to system console and
recorded in the OS system error log
OS System Logs
Solaris and Linux System Logs
# vi /var/adm/messages
Solaris
# view /var/log/messages Linux
Windows System Logs (using Event Viewer)

Start -> Control Panel -> Administrative Tools -> Computer
Management
11
The customers first indication of an error may be displayed to the system console and will be
recorded in the OS system error log. To access these logs through Solaris and Linux you can
display the messages files using the vi or view command to display the contents of the messages
files. The messages files are generated and updated by the syslog facility of Solaris and Linux.
For Windows, navigate to the Computer Management screen which will give you access to the
Event Viewer that displays the system log among other screens.
Slide 12
ILOM, BIOS and IPMI System Event Logs

ILOM SEL (System Event Logs)
CLI: -> show /SP/logs/event/list
BUI: System Monitoring -> Event Log
BIOS DMI (Direct Memory Interface) / SEL

BUI: Advanced -> Event Logging
IPMI SEL
CLI: # ipmitool -U root -H <HOST> sel elist
Log Fields
12
Timestamp
Severity
Description
Device
System event logs are also supported within ILOM, BIOS and IPMI. ILOMs system event logs
are available through its CLI and BUI interfaces. The CLI command line is displayed here along
with the path to the BUI screen.
BIOS has a system event log that can be accessed by using the navigation path displayed here to
open the Event Logging screen. IPMI, which is the closest management tool to the hardware, has
a system event log that is accessible through the IPMItool. This log is a subset of the events
posted within the BIOS system event log.
Logs can give you an ordered list of events that lead up to the hardware problem by using the
timestamp field to order the entries. The severity field determines the severity of the reported
event. The description field can pinpoint the source of the problem, or it can give enough
information about the problem to start the FRU isolation process. The device field of a log is
useful to determine the log entries that are related to the same device.
Slide 13
ILOM and IPMItool LED Information

ILOM CLI: ->
show /SP/faultmgmt
ILOM BUI: System
Monitoring -> Fault Management
IPMItool shell/command prompt

# ipmitool -U root -H <HOST> sdr list (sensors)
On some platforms the command is:
# ipmitool -U root -H <HOST> led get all(indicators)
But this command generates errors on some platforms (e.g. X4440)
and you need to use the following command instead :
# ipmitool -U root -H <HOST> sbled get all (indicators)
IPMItool is available from the servers Tools and Driver

CD/DVD or within MOS
MOS/ISP -> Patches & Updates -> Patch Search -> Product or Family
(Advanced) -> Select Platform
13
Diagnostic Guide
If the customer does not have easy access to the x86 server and therefore can not visibly report
on the status of the LED indicators, then there are ways to view this information remotely. The
x86 server indicators and sensors data can be accessed using the ILOM CLI under
/SP/faultmanagement, or within the ILOM BUI under the Fault Management tab. This data can
also be displayed using the IPMItool CLI command that is supported by IPMI. In the IPMItool
examples on the slide, notice that sdr within the first command line corresponds to the sensor
data repository while led within the second command refers to the indicators. Note that Fault
Management is only available on current systems.
The IPMItool software is available from the servers Tools and Driver CD, or from the MOS
location displayed. Manuals on the use of the IPMItool commands are available within the
Diagnostics Guide for a specific server. Click on the link for a diagnostic guide sample. The
presentation will now stop to allow you to access the path and link.
Slide 14
Operating System Utilities

Solaris and Linux Utilities
prtdiag
prtconf
ifconfig a
sysconfig
netstat -a
(HW configuration and state for Solaris)

(Logical configuration for Solaris)
(Network configuration for Solaris and Linux)
(HW configuration for Linux)
(Network config for Solaris, Linux & Windows)
Windows Utilities
14
ipconfig
(Network configuration and state for Windows)
View hardware configuration information with msinfo32
Start -> Run and type msinfo32 to view the configuration
File -> Save to save the configuration.
Solaris, Linux, and Windows operating systems also provide utilities to view the current
hardware and network configurations. The utilities are listed along with the operating systems
that support them. Notice that the ifconfig command also gives you the ability to modify the
network configuration. There are no exact Windows equivalents to most of the utilities shown,
however there is a command line utility call ipconfig. Also, hardware configuration information
can be displayed using msinfo32 byt navigating to Start -> Run then selecting File -> Save to
save the configuration.
Slide 15
ILOM Snapshot
-> set /SP/diag/snapshot/dataset=data

-> set /SP/diag/snapshot/dump_uri=URL
data:
normal
full
normal-logonly
full-logonly
Collects ILOM, OS and HW information

Collects all information and may reset the server
Collects only log files in normal mode
Collects only log files in full mode
URL:
Any valid target directory location
protocol://username:password@host/directory
Protocols supported: tftp, ftp, sftp, scp, http, or https

15
ILOM Addendum
Until now we have been gathering individual portions of system data. It would be more efficient
to gather larger portions of data to analyze. The ILOM snapshot utility collects log files, runs
various commands and collects their output from the service processor, then sends the data
collection as a downloaded file to a user-defined location.
To perform an ILOM snapshot, define the data to be collected using the first set command shown
on the slide. The data field can be normal, full, normal-logonly and full-logonly. The variable
you select depends on how much data you want to collect as indicated by their definitions.
The second command sets the location where the data will be sent using the format displayed.
The protocols supported are tftp, ftp, sftp, scp, http, or https. The same protocols supported by
the ILOM commands are used for backup and restore of the ILOM configuration.
For more information on ILOM snapshot, click on the ILOM Addendum link.
Slide 16
Oracle Explorer Data Collector

Oracle Explorer Data Collector is a diagnostic data collection tool made
up of scripts and executables as part of the Services Tools Bundle (STB).
Oracle Explorer Data Collector is a collection of shell scripts that gathers
information and creates a detailed snapshot of a system's configuration.
Type of information collected includes
Information related to drivers and patches
Recent system event history and log file entries
View Oracle Explorer Data Collector
Install Oracle Explorer Data Collector using the Service Tools Bundle
(STB) Installer
STB User's Guide
16
STB Installer
Oracle Explorer Data Collector is a diagnostic data collection tool that is made up of shell scripts
and a few binary executables. Oracle Explorer Data Collector is designed to run on Solaris x86
platforms and is distributed as part of the Services Tools Bundle or STB.
Oracle Explorer Data Collector is a collection of shell scripts that gather information and create a
detailed snapshot of a system's configuration. Information related to drivers, patches, recent
system event history and log file entries is obtained from the Explorer output. For additional
information click on the Oracle Explorer link.
You can install Oracle Explorer Data Collector using the STB installer. The STB software and
documentation can be accessed from the STB link. Refer to the STB documentation for the
installation instructions.
Now that we've discussed the use of Data Gathering Troubleshooting Tools to view system
information, lets Check Your Knowledge.
Slide 17
PROPERTIES
On passing, 'Finish' button:
On failing, 'Finish' button:
Allow user to leave quiz:
User may view slides after quiz:
User may attempt quiz:
Goes to Next Slide

Goes to Next Slide
At any time
At any time
Unlimited times
Slide 18
Diagnostic Troubleshooting Tools

x86 Platform troubleshooting tools includes diagnostics to
generate more information about a problem
Tools
Resident in
Description
Oracle VTS (SunVTS)
Bootable CD or Solaris
Oracle VTS (SunVTS) is an exerciser that can

either be booted from a CD/ISO image or installed
directly on a host running the Solaris OS.
POST
BIOS
Power On Self Test that executes when the server

is reset or powers on.
U-Boot
SP
SP diagnostics that executes when the SP resets

or powers on.
spdiags and hostdiags
ILOM
SP diagnostics that runs during SP initialization
Pc-Check
ILOM
PC Check diagnostics is integrated into ILOM and

also comes in the servers Tools and Driver
CD/DVD
For tools for specific x86 servers, view the Sun x86 Servers Diagnostic Guide
found in the Sun System Handbooks Related Documentation link.
18
In the next section of the course, we will review some diagnostic troubleshooting tools that are
compatible with most x86 Platforms. Along with the tool, the table displays where the diagnostic
tool resides and a description of its functionality. For the tools that are compatible with a specific
x86 Server, view the Oracle x86 Servers Diagnostic Guide that can be found in the Sun System
Handbook's Related Documentation link.
Slide 19
Oracle Validation Test Suite (Oracle VTS)

Oracle Validation Test Suite (previously known as SunVTS) is
an exerciser that tests and validates Sun hardware and repair
verification
Oracle Validation Test Suite runs from:
Server running Solaris OS
Bootable CD/DVD
USB boot image
Oracle Validation Test Suite downloads from:
Oracle VTS Download

19
Oracle Validation Test Suite, previously known as SunVTS, is an exerciser that tests and
validates Sun hardware. Oracle Validation Test Suite or Oracle VTS is used to ensure the proper
operation of the overall system under test and its underlying hardware. It stimulates, detects, and
identifies hardware faults and is used for both hardware validation and repair verification.
The Oracle VTS diagnostics are available on Solaris, USB boot image, or off a bootable CD. The
bootable CD allows you to boot a CD resident Solaris OS which boots a CD resident Oracle VTS
then tests the server and generates a report.
The minimum Oracle VTS version supported is the one that comes shipped with the server. The
current Oracle VTS version can be found in the servers product notes and can be downloaded
from the link provided on the slide. The SunVTS download link also provides SunVTS versions
for Linux.
Slide 20
Oracle Validation Test Suite (Oracle VTS) (cont.)

Oracle Validation Test Suite:
CD DVD Test (cddvdtest)
CPU Test (cputest)
Cryptographics Test (cryptotest)
Disk and Diskette Drives Test (disktest)
Data Translation Look-aside Buffer Test (dtlbtest)
Emulex HBA Test (emlxtest)
Floating Point Unit Test (fputest)
InfiniBand Host Channel Adapter Test (ibhcatest)
Level 1 Data Cache Test (l1dcachetest)
Level 2 SRAM Test (l2sramtest)
Physical Memory Test (pmemtest)

QLogic Host Bus Adapter Test (qlctest)
RAM Test (ramtest)
Serial Port Test (serialtest)
System Test (systest)
Universal Serial Board Test (usbtest)
Virtual Memory Test (vmemtest)
Tape Drive Test (tapetest)
Network Hardware Test (nettest)
Ethernet Loopback Test (netlbtest)
For descriptions of these tests and instructions on how run Oracle VTS refer to:
http://www.oracle.com/technetwork/documentation/sys-mgmt-networking-190072.html
20
The Oracle VTS is listed. As you can see these tests cover all server internal components as well
as I/O components. For descriptions of these tests and instructions on how run Oracle VTS, click
on the link.
Slide 21
POST Diagnostics
Power On Self Test is a series of diagnostics that execute before
the server OS is booted to verify that the hardware is healthy and
the configuration is valid
Fatal HW Error:
OS boot will stop
The error is reported to:
ILOM Fault Management
ILOM System Event Logs
POST Events table is in the Service Manuals
POST Events Table
POST Error Codes table is found in the appendix of the

Diagnostic Guides
21
POST Error Codes
Power On Self Test is a series of diagnostics that execute before the server OS is booted to verify
that the hardware is healthy and the configuration is valid. If a fatal hardware error is
encountered, the OS boot will stop and the error is reported to ILOM Fault Management and
ILOM System Event Logs. A list of POST events that can stop or allow OS boot to continue are
found within the servers service manual. A list of POST error codes are found in the appendix of
the Diagnostic Guides. These tables can be useful in trying to determine the cause of a hardware
failure caught by POST.
Slide 22
U-Boot Diagnostics
At system start-up, U-Boot diagnostic software initializes the
server and tests the server SP prior to booting the ILOM firmware
U-Boot Test
Normal
Quick
Extended
Description
Memory Data Bus Test
Checks for opens/shorts on SP Memorys data bus
Memory Address Bus Test
Checks for opens/shorts on SP Memorys address bus.
Memory Data Integrity Test
Checks for data integrity on the SP Memory.
Flash Test X Checks access to

Flash.Watch Dog Test
Checks the Watch Dog functionality on the SP.
I2C Probe Tests
Checks the connectivity to I2C devices on standby power.
Verifies ability to read from specified Ethernet port.
Ethernet Test
U-Boot execution modes:

Normal (default)
Quick (optional)
Extended (optional)
22
At system start-up, when AC power is connected to the x86 Platform, the U-Boot diagnostic
software initializes the server and tests aspects of the server service processor prior to booting
the ILOM firmware. The U-Boot diagnostic tests are designed to test the hardware required to
enable the server SP to boot successfully.
There are three execution modes that U-Boot supports. These include normal mode, which is the
default mode, quick and extended modes, which are optional. The modes determine which tests
are run and for how long.
The U-Boot tests are listed in this table according to the mode they run in with a description of
the test. The presentation will now stop to allow you to view this table.
Slide 23
U-Boot Diagnostics (cont.)

U-Boot Test
Normal
Ethernet Link Test
Ethernet Internal Loopback

Test
Real Time Clock Test
Quick
X
Verifies link on specified PHY.
Verifies Ethernet functionality by sending

and receiving packets.
Checks functionality of the real-time clock

on the SP.
Checks USB 1.1 functionality
Runs internal USB 1.1 built-in self-test (BIST).
Checks USB 2.0 functionality.
Verifies ability to read from the BIOS flash.
Verifies DIMM SPD access along with

checksum and prints SPD information.
Test verifies the correct power revision of the complex

programmable logic device (CPLD).
USB 2.0 Test

BIOS Flash ID Test
Serial Presence Detect (SPD)

Access Test
Power CPLD
Description
USB 1.1 Test
USB 1.1 BIST
Extended

23
This is the continuation of the table on the U-Boot tests.
Slide 24
U-Boot Diagnostics (cont.)

Reset or Reboot the server host
.
.
.
Enter Diagnostics Mode {'q'uick/'n'ormal (default)/e'x'tended]...

.
.
.
Any U-Boot failures are reported to the ILOM System

Event Log and the Fault Management.
For more test information on U-Boot refer to
the Oracle x86 Servers Diagnostics Guide
which can be found on this link:
X86 Server Documentation

24
To configure and run U-Boot diagnostics, power cycle or reset the server then wait for the UBoot message that will display over the serial port. When it appears select either q , n, or x for
the U-Boot mode. The U-Boot tests will display on the console.
Note, any U-Boot failures are reported to the ILOM System Event Log and the Fault
Management. For more information on U-Boot refer to the Oracle x86 Servers Diagnostics
Guide.
Slide 25
Pc-Check Diagnostics
Pc-Check is a DOS-based diagnostics utility
Available from:
Within ILOM through its CLI and BUI

Servers Tools and Driver CD/DVD
25
Pc-Check is a DOS-based diagnostics utility that can be used to test the x86 Platforms. Pc-Check
is available in newer service processors. For servers that do not have a service processor, PcCheck can be executed from the servers Tools and Driver CD/DVD.
Slide 26
Pc-Check Diagnostics (cont.)

Pc-Check Operating modes:
26
Enabled Runs a select list of tests upon start up of the host

Extended Runs a comprehensive test suite upon start up of the host
Manual Select individual tests or test suites
Disabled Disables testing
Pc-Check supports four operating modes.

The enabled mode runs a select list of tests upon start up of the host which takes
approximately 3 minutes to execute. Once the tests complete, it will continue to boot the next
device based on the BIOS Boot Priority List. This is a quick test that is recommended upon first
time field installation.
The extended mode runs a comprehensive test suite upon start up of the host which takes
approximately 30 minutes or longer to execute. This is a longer test that is recommended any
time you physically change the system configuration to verify the new configuration.
The manual mode allows you to select individual tests from the Pc-Check menus, or to select
predefined test suites available through the Immediate Burn-in test menu. This is recommended
when you want to test individual server components for fault isolation testing.
The disabled mode is the default mode that allows you to disable Pc-Check diagnostic tests
upon start-up of the host.
Slide 27
Pc-Check Diagnostics Through ILOM

Accessing Pc-Check through
ILOM CLI
-> set /SP/diag state=mode
-> stop /SYS
-> start /SYS
ILOM BUI
Remote Control -> Diagnostics

Manual Mode
Advanced Diagnostics Testing Menu = individual test
Immediate Burn-in Testing Menu = test suites
For more test information on Pc-Check, refer to the Oracle x86 Servers
Diagnostics Guide which can be found on this link:
X86 Server Documentation
27
To access Pc-Check you can either use the ILOM CLI or BUI.
The CLI commands listed set the Pc-Check mode then reboot the server. This will include Pc-Check
testing during the server host boot. The BUI navigation path displayed gives you access to the Pc-Check
setup where you can select the mode.
If manual mode was selected via either CLI or BUI, the Pc-Check menus will be displayed with a choice to
select the Advanced Diagnostics Testing Menu which will display the individual tests, or to select the
Immediate Burn-in Testing Menu to display the test suites.
Slide 28
HW Level ILOM Diagnostics

spdiags
LED test to check proper functioning of LEDs
Temperature sensor testing of components
hostdiags
Command line examples:
-> hostdiags info
-> hostdiags fan_test
-> hostdiags memerr
-> hostdiags psu_test 0
An example of the use of Hostdiags is documented in Bug#6828998 under Sun
this bug report under the Sun database.
Bug#6828998 under Oracle
28
The spdiag command, within ILOM 2.0, opens a menu of tests that allow you to select the
component to test. Two examples are the LED test that can turn on/off LEDs and the temperature
command that tests the temperature sensors of the CPU, DIMMs and other components.
Another set of diagnostics is the hostdiags. This is a CLI command that has a series of options
that can be added to a command line. The most useful commands are: info for the host state, fan
test to verify the fans and memerr to display memory errors.
For more information on Hostdiags, refer to the documented bug. It is also important to note that
this particular bug number was duplicated in the Oracle and Sun bug databases. The same bug
number was used, but they are two different issues. To avoid confusion, be sure to indicate in the
Product Source field whether this is a Sun or Oracle bug in the bug search.
Slide 29
PROPERTIES
Goes to Next Slide

Goes to Next Slide
At any time
At any time
Unlimited times
Slide 30
Determining Storage Configuration

Sun Disk Management Overview document (820-6350)
Library
x86 Platform Disk Controllers:

Disk controller supported
RAID levels supported
Disk controller configuration mechanisms
OS driver support
Disk management upgrade tools
Firmware upgrade tools
30
Storage Management Solutions supported by the x86 Platforms are identified within the Sun
Disk Management Overview document listed. Click the link provided to open a library where
this document is located. Scroll down the document list and open the Sun Disk Management
Overview entry.
This document lists the x86 servers along with the disk controllers they support, the RAID levels
supported, what mechanism is used to configure the disk controller, what operating systems have
drivers to support the controller, along with the disk management and firmware upgrade tools.
Slide 31
Determining Disk Controller Type

# /usr/sbin/prtconf D
[Solaris]
# lsscsi H
[Linux]
Control Panel -> System -> Hardware -> Device Manager
-> SCSI and RAID Host Bus Adapter [Windows 2003]

Control Panel -> System -> Device Manager
-> Storage Controller [Windows 2008]
Disk controller Manufacturers: Intel, LSI, Adaptec and Nvidia

Disk controller types: SAS, SATA , IDE and SCSI
Disk types: HDDs, SSDs, Compact Flash cards, and Flash Modules
31
From the OS, the type of disk controller your server supports can be displayed using the Solaris
command, Linux command or Windows navigation paths displayed.
The x86 Platforms may use disk controllers provided by Intel, LSI, Adaptec and Nvidia,
supporting SAS, SATA, IDE and SCSI. Storage disk types supported are HDDs, SSDs, Compact
Flash cards, and Flash Modules.
Slide 32
Storage Management Tools
Oracle Hardware Installation Assistant (OHIA)
OHIA Library
Sun LSI 106x RAID Users Guide (820-4933)
LSISAS1064/1064E
LSISAS1068/1068E
Sun Intel Adaptec BIOS RAID Utility User's Manual - 820-4708
Intel Matrix Storage - 820-7143

32
The Oracle Hardware Installation Assistant, or OHIA, is a storage management tool that has the
capability to update some of the Host Bus Adapters provided by Oracle. For documents on OHIA
refer to the library link provided.
If you are dealing with LSI disk controllers the document listed provides the instructions on how
to configure and manage the disks supported by the controllers that are listed. Be mindful that
this list will grow as more LSI disk controllers are released. The Oracle LSI part numbers are
listed on the slide.
For Adaptec disk controllers, the document listed provides instructions on how to configure and
manage supported Adaptec disks. Consider that this list will also expand as more Adaptec disk
controllers are released. The Oracle Intel Adaptec BIOS RAID Utility Manual part number is
820-4708. The Intel disk controller part number is 820-7143
Slide 33
PROPERTIES
Goes to Next Slide

Goes to Next Slide
At any time
At any time
Unlimited times
Slide 34
Difference Between a FRU and CRU
FRUs and CRUs are server components that were

designated as replaceable at the customer site
FRUs can only be replaced by a qualified Oracle or
Partner technician
CRUs are replaced by an Oracle Customer
34
In earlier courses, we learned the definitions of a Field Replaceable Unit, or FRU, and a
Customer Replaceable Unit, or CRU. For the x86 Platform, FRUs and CRUs are server
components that were designated as replaceable at the customer site. A component designated as
a FRU can only be replaced by a qualified Oracle or Oracle Partner technician. A CRU is
replaced by the Oracle customer.
Slide 35
Locating the FRU and CRU List

Sun System Handbook
IPMI
# ipmitool -U root -H 10.8.151.171 fru list
X4140 FRUs and CRUs

35
The FRU and CRU list for a specific server can be found in the Sun System Handbook. Click the
link to review a sample server list of CRUs and FRUs. From a command line you can use the
IPMITool command to list the FRUs and CRUs. The example command on the slide will list
FRUs on the X4140 Server.
Slide 36
Replacing FRUs and CRUs

Locate the installation and replacement procedures
Server Installation and Service Manuals (docs.oracle.com)
Server top cover label
EIS Checklists
X2200 M2

36
x86 installation and replacement procedures are located in the server installation and service
manuals which can be found on the Oracle Technology Network. The procedures can also be
found on a label on some the server top covers. As mentioned earlier, the EIS checklists are
highly recommended for server installation.
Slide 37
Hot Swap, Hot Pluggable or Cold Swap

Replacement Methods
Hot Swap
Component can be removed and installed in the server without
software intervention. An example is a cooling fan.
Hot Pluggable
Component requires software intervention prior to removal. An
example is running cfgadm to remove a disk drive.
Cold Swap
Component requires the server to be powered down prior to
removal. An example is a DIMM.
37
Its vital to understand component replacement procedures. Before starting a replacement of a

component, you must determine whether the component is a hot swap, a hot pluggable, or a cold
swap component. A hot swap component can be removed or installed in the server without
software intervention. An example of this would be a cooling fan. A hot pluggable component
can be installed in the server, but requires software intervention prior to removal of the part.
Configuration of the device to be replaced requires preparation to alert the operating system that
the component is no longer available for operation. An example is the use of CFGADM to
prepare a disk drive for removal. A cold swap component requires the server be powered down
prior to removal or installation of the component. An example is a DIMM memory card.
Slide 38
Replacement of Memory
Population guideline differences
Between Intel-based and AMD-based servers
Between current Intel and earlier, legacy Intel processors
Server configuration of memory and their relative locations
within the server
DIMM manufacturer, type, density and speed
Location of the population guidelines

Servers Service Manual
Servers top cover label

EIS Checklist
38
Server memory varies, so you need to rely on server memory population guidelines. How you
proceed with populating DIMMS depends on the CPU, the servers memory configuration, and
the DIMM specifications.
There are population guideline differences between Intel-based and AMD-based servers and also
between current Intel processors and earlier legacy Intel processors. The server configuration
may not utilize the processors full memory capacity which determines the population guidelines.
The manufacturer, type of DIMM, as well as their density and speed may also determine the
population guidelines.
Due to the differences, it is important to reference population guides for each server. These
guidelines can be found in the servers service manual, the servers top cover label, or the EIS
checklist.
Slide 39
Sample AMD Opteron-Based Memory Configuration

X6240 Server Module
39
This is the first of three examples of x86 platform memory configurations. This slide shows the
X6240 server blade which is an AMD Opteron-based server. Each CPU supports 8 DIMM slots
that are shared using a Hypertransport link between the CPUs. The DIMMs need to be installed
in pairs, as indicated in the tables. The DDR2 DIMMs must come from the same manufacturer
and have the same density and speed. The population order starts from the DIMM slot farthest
from the CPU.
Slide 40
Example Legacy Intel-Based Memory Configurations
X4150
Server
40
The X4150 server blade provides an example of x86 platform memory configuration. This is a
legacy-based server, where the two CPUs share 16 DIMM slots. The DIMMs need to be installed
in pairs, as indicated in the tables, with FB-DIMMs that come from the same manufacturer with
the same density and speed. Notice that the A and B channel DIMM slots are the first matched
DIMM slots to be populated while the C and D channel DIMM slots are the second matched
DIMM slots to be populated. All DIMM slots are shared by both CPUs through a NorthBridge
chip since the A to D channels are directly connected to this chip.
Slide 41
Sample of a Current Intel Memory Configuration

Ranked DIMMs
X4170 Server
DDR3

41
A third example of a memory configuration is the X4170 server. This is a Xeon-based server,
with two CPUs that share 12 DIMM slots through a Quick Path link between the CPUs. Each of
the two Intel processors has eight associate DIMM sockets, D0 through D7, as shown in the
diagram. The DIMM types supported are DDR3s that are quad, dual or single ranked. Click the
link for a description of ranked DIMMs starting with Quad Ranked DIMMs.
Slide 42
Replacement of Disks
The disk population guidelines are dependent on the
platform type and whether the disk is directly or
indirectly connected to the server.
Replacement rules
Before a disk can be removed it needs to be isolated from its
operating environment, if not under RAID control
Replacement procedures are located in
Servers Service Manual
Server Top Cover Label
EIS Checklist
NOTE: Disk replacement is also dependent on the type of OS.

42
Disk population guidelines are dependent on several parameters. These include the platform
type and whether the disk is directly or indirectly connected to the server.
No matter what type of disk or how it is physically associated with the server, it must be isolated
from its operating environment before removal, if not under RAID control. The procedure for its
replacement can be located in the servers service manual, the server top cover label, or the EIS
checklist.
Note, disk replacement is also dependent on the type of OS.
Slide 43
Replacement of Other Components

Other external server components:
Power supplies
CD/DVD drives
Internal Components:
CPUs
Riser Cards
CPU Modules
Memory Board
Motherboard
Fan Modules
SP Board
Fan Boards
REM and FEM
System Battery
Disk Backplanes
I/O Adapters
Front Panel Indicator Board

Power Distribution Board
43
Sun System Handbook

X4600 M2 Service Manual
Some of the external and internal server components are listed that can be replaced on the x86
platforms. The supported components for a specific server can be determined by displaying its
Full Components list within the Sun System Handbook. Click on the link provided to view the
Sun System Handbook.
As in the case of the memory and disks, the procedures for the external and internal component
replacements can be located in either the servers service manual, the servers top cover label, or
on the EIS checklist. Click on the link provided to display the X4600 M2 Service Manual so that
you can view examples of its component procedures.
Slide 44
PROPERTIES
Goes to Next Slide

Goes to Next Slide
At any time
At any time
Unlimited times
Slide 45
x86 Platform Course Summary
In summary, you have studied how to interpret system indicators,

locate and describe tools used to gather troubleshooting data,
locate and describe diagnostic tools, and describe FRU/CRU
replacement procedures for x86 systems.
45
In summary, you have studied how to interpret system indicators, locate and describe tools used
to gather troubleshooting data, locate and describe diagnostic tools, and describe FRU/CRU
replacement procedures for x86 systems.
Slide 46
End of Sun x86 Servers Troubleshooting Tools and

FRU/CRU Replacement Course
46
WZD-SSx86-301
This completes the Sun x86 Servers Troubleshooting Tools and FRU/CRU Replacement course.
Remember, in order to get credit for this course, you must take the course assessment and pass
with a score of 80% or higher.
Slide 47
Thank You
47
Slide 48
Oracle

Slide 1: Oracle

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Slide 1: Oracle

Hochgeladen von

Copyright:

Verfügbare Formate

Slide 1

1 Copyright 2011, Oracle and/or its affiliates. All rights reserved.

Sun x86 Servers Troubleshooting Tools and

Copyright 2011, Oracle and/or its affiliates. All rights reserved.

Proprietary and Confidential

supported on x86 systems

Copyright 2011, Oracle and/or its affiliates. All rights reserved.

Proprietary and Confidential

Before You Begin

So, lets get started

Copyright 2011, Oracle and/or its affiliates. All rights reserved.

Proprietary and Confidential

System Indicators (LEDs)

Copyright 2011, Oracle and/or its affiliates. All rights reserved.

Proprietary and Confidential

Interpreting System Indicators

OFF = Server not powered

Service Action Required LED (amber):

ON = Component ready for removal

Copyright 2011, Oracle and/or its affiliates. All rights reserved.

Proprietary and Confidential

Interpreting Locator, CPU, and DIMM Indicators

CPU and DIMM LEDs (amber):

Copyright 2011, Oracle and/or its affiliates. All rights reserved.

Fault Remind Button

Proprietary and Confidential

Interpreting Disk and Ethernet Indicators

Speed (green [1-Gbit/sec])

Copyright 2011, Oracle and/or its affiliates. All rights reserved.

Proprietary and Confidential

Interpreting Power Supply and Fan Indicators

Ready-to-Remove (blue) [optional]

Copyright 2011, Oracle and/or its affiliates. All rights reserved.

Proprietary and Confidential

Data Gathering Troubleshooting Tools

System Event Logs

ILOM, BIOS, IPMI, Solaris,

Logs of system events categorized along with a

faultmgmt and ipmitool

ILOM and IPMI

Displays sensor and indicator information

prtdiag, prtconf, sysconfig

Displays system configuration

Solaris and Linux

Displays and configures network configuration

Displays and configures the network

This utility enables you to produce a snapshot

A Solaris specific data gathering tool for

Microsoft Product Support Reports (Data

RedHat, Oracle Enterprise

Data Collection Tool (built into OS)

Data Collection Tool (built into OS)

Press PLAY (4) to Continue

Copyright 2011, Oracle and/or its affiliates. All rights reserved.

Proprietary and Confidential

OS System Event Logs

Windows System Logs (using Event Viewer)

Copyright 2011, Oracle and/or its affiliates. All rights reserved.

Proprietary and Confidential

ILOM, BIOS and IPMI System Event Logs

BIOS DMI (Direct Memory Interface) / SEL

Copyright 2011, Oracle and/or its affiliates. All rights reserved.

Proprietary and Confidential