You are on page 1of 2

3/4/2014

Document 1020990.1
Muhammed

PowerView is Off
Dashboard

Knowledge

Service Requests

Patches & Updates

(0)

Contact Us

Search Knowledge Base

BIOS Versions Prior to 3.0.2 May Cause System Hangs on Sun Fire x4150/X4250/x4450 Systems (Doc ID 1020990.1)
Modified: 23-Jan-2014

Type: ALERT

Help

Advanced

To Bottom

Migrated ID: 268668

Bug Id
BUG:15582230, BUG:15583785
Date of Resolved Release
02-Oct-2009
***Checked for relevance 23-Jan-2014***

1. Impact
Sun Fire X4150/X4250/X4450 systems with BIOS versions 3.0.1 or earlier may hang as a result of correctable ECC memory errors not being handled properly.

2. Contributing Factors
This issue can occur on the following platforms:
Sun Fire X4150/X4250/X4450 system with BIOS versions 3.0.1 or earlier
Note 1: Sun Fire X4150 and X4250 servers with BIOS versions 1ADQW061 or earlier and X4450 servers with BIOS versions 3B61 or earlier have an issue where the
SMI (System Management Interrupt) handler will never exit when it tries to handle a patrol scrub detected correctable ECC memory error. When this happens, the
system will lockup with no ILOM SEL entry indicating the problem. This bug does not affect all operating systems due to the different ways they can handle a patrol
scrub detected correctable memory error. VMWare 3.5, 4.0 and RHEL 5.3 are known to encounter this hang condition because they will pass patrol scrub
correctable errors on to the BIOS.
Note 2: Correctable errors can occur even in healthy systems. The likelihood of a system hang due to this bug is based on if an error occurs, when it occurs, how it
is detected, and the operating system running.

3. Symptoms
If the described issue occurs, the system will lock up/hang with no ILOM SEL entry indicating a problem. Access to the ILOM is not affected.

4. Workaround
There is no workaround for this issue. Please see the Resolution section below.

5. Resolution
This issue is addressed on the following platforms:
Sun Fire X4150/X4250/X4450 systems with BIOS revision 3.0.2 or later
It is recommended to update affected systems with the latest BIOS versions located at:
For Sun Fire X4150:
http://www.oracle.com/technetwork/systems/patches/firmware/release-history-jsp-138416.html#X4150
For Sun Fire X4250:
http://www.oracle.com/technetwork/systems/patches/firmware/release-history-jsp-138416.html#X4250
For Sun Fire X4450:
http://www.oracle.com/technetwork/systems/patches/firmware/release-history-jsp-138416.html#X4450
Note: The above releases contain BIOS 1ADQW062 for the Sun Fire X4150/X4250 and BIOS 3B62 for the X4450
Modification History:
22-Aug-2012: Maintenance check for relevance/currency, no change in content
23-Jan-2014: Checked for currency/relevance/formatting; no change in content
Product
Sun Fire X4150 Server
Sun Fire X4250 Server
Sun Fire X4450 Server
There are 2 other known issues that are being fixed in the next (3.1.0) software release:

https://support.oracle.com/epmos/faces/DocumentDisplay?_afrLoop=307038722829413&id=1020990.1&displayIndex=5&_afrWindowMode=0&_adf.ctrl-state=upy

1/2

3/4/2014

Document 1020990.1

Issue 1:
Incorrect error messaging
If a correctable ECC memory error is detected by the CPU, you will see this SEL entry as usual:
|67| IPMI | @Log | minor | Fri Sep 4 17:04:57 2009 | ID = 1d : 09/04/2009 : 17:04:57 : Memory : BIOS : Correctable ECC; Channel: D, DIMM: 5 |
If the background scrubber @detects the correctable ECC memory error, the SEL entry will look like this:
|118| IPMI | Log | *critical*| Tue Sep 8 18:00:47 200 | ID = 3f : 09/08/2009 : 18:00:47 : @Memory : BIOS : Memory Scrub Failed; Channel: D, DIMM: 5
This incorrectly indicates the error as critical. A scrubber correctable ECC memory error is not a critical @error despite the SEL entry. This will be fixed in the next
software release and both types will be reported as a minor correctable ECC.
Issue 2:
Dimms being falsely mapped out during POST due to correctable ECC memory errors.
POST should not map out a DIMM due to detecting a correctable ECC memory error. If during POST a DIMM is mapped out, the system should be rebooted to
determine if the mapped out DIMM is due to a correctable ECC memory error at which point three things could happen:
1. The Dimm error goes away indicating the issue was due to a correctable ECC memory error at which point everything is fine.
2. If the same DIMM maps out there is likely a bad Dimm DIMM and the DIMM pair should be replaced.
3. If a different dimm maps out you should continue to reboot the system
until the error goes away or you see a persistent DIMM mapping out and
that Dimm pair should be replaced.
Questions regarding this document should be
addressed to sunalerpublication_us_grp@oracle.com
and copy the responsible engineer/submitter listed below.
Internal Contributor/submitter
Jake.Bell@Sun.COM
Internal Eng Responsible Engineer
leigh.chen@sun.com
Internal Services Knowledge Engineer
jeff.folla@sun.com
Internal Eng Business Unit Group
SVS (SPARC Volume Systems, Horizontal Systems (includes T2000/Ontario), NWS (Network Storage),
Systems Group-x64 (X4100-X4600 (includes M2), V20z/V40z/V60z/V65z, Ultra20/40)

REFERENCES
BUG:15582230
BUG:15583785

Related
Products
Sun Microsystems > Servers > x64 Servers > Sun Fire X4150 Server
Sun Microsystems > Servers > x64 Servers > Sun Fire X4250 Server
Sun Microsystems > Servers > x64 Servers > Sun Fire X4450 Server

Keywords
BIOS; CPU; DIMM; ILOM; MEMORY ERRORS; STORAGE; VMWARE; VOLUME; X4250; X4450
Back to Top
Copy right (c) 2014, Oracle. A ll rights reserv ed.

Legal Notices and Terms of Use

Priv acy Statement

https://support.oracle.com/epmos/faces/DocumentDisplay?_afrLoop=307038722829413&id=1020990.1&displayIndex=5&_afrWindowMode=0&_adf.ctrl-state=upy

2/2