Beruflich Dokumente
Kultur Dokumente
Ashley Montgomery
Debugging
Platform Application
Engineer
Intel Corporation
Machine Check
Tian Tian
Platform Application
Exceptions on
Embedded IA
Engineer
Intel Corporation
Platforms
July 2010
324077-001
Debugging Machine Check Exceptions on Embedded IA Platforms
Executive Summary
Embedded systems need to be able to detect, recover from, and report
errors. This is a critical feature not only during debugging but also for
quality control after product manufacturing has begun. The importance of
advanced error handling capabilities is often magnified for embedded
systems because many are deployed in a large number of units,
dispersed widely, and are running mission-critical type applications.
Further, the embedded systems present a unique challenge due to their
diverse form factors, vastly different feature sets, and special usage
models.
2
Debugging Machine Check Exceptions on Embedded IA Platforms
the CPU. When the CPU detects critical machine check exceptions and the
errors are not correctable, the CPU will reset the system to prevent error
situations from getting worse. The MCE registers capture some of the
error information as seen by the CPU at the point of failure, which can be
important information in order to get to the root cause of the error.
3
Debugging Machine Check Exceptions on Embedded IA Platforms
Contents
Machine Check Architecture ....................................................................................... 5
4
Debugging Machine Check Exceptions on Embedded IA Platforms
Some MCEs are uncorrectable and the system will need to reset to recover
itself. In this situation the CPU has concluded that the system is no longer in
a safe or reliable operating mode, or the cost of trying to recover from the
error (either by hardware or software) is prohibitive.
5
Debugging Machine Check Exceptions on Embedded IA Platforms
unit in a timely fashion in order to debug. The parts may be operating in all
kinds of environments including extreme temperatures or high altitudes,
which add to the complexity of trouble-shooting and the task of eliminating
suspects.
6
Debugging Machine Check Exceptions on Embedded IA Platforms
mobile and desktop markets. Several of the embedded applications are also
required to operate non-stop for 7-10 years with extremely low error rates.
7
Debugging Machine Check Exceptions on Embedded IA Platforms
IA32_MCG_CTL MSR
It is important to determine if the machine check features are enabled in
order for MCEs to be captured. The IA32_MCG_CTL controls the reporting of
machine check exceptions. The IA32_MCG_CTL MSR is present if the
capability flag, MCG_CTL_P is set in the IA32_MCG_CAP MSR register. If
present, writing 1s to this register enables MCE features and writing all 0s
disables MCE features. Refer to Ref [1] for more information.
Table 2 IA32_MCG_STATUS
8
Debugging Machine Check Exceptions on Embedded IA Platforms
A more detailed description of the MCE Status Registers can be found in the
Machine-Check MSRs section in Ref [1].
9
Debugging Machine Check Exceptions on Embedded IA Platforms
Table 5 shows the simple error codes. These codes indicate global error
information.
Notes:
1. BINIT# assertion will cause a machine check exception if the processor (or any processor on
the same external bus) has BINIT# observation enabled during power-on configuration
(hardware strapping) and if machine check exceptions are enabled (by setting CR4.MCE = 1).
2. At least one X must equal one. Internal unclassified errors have not been classified.
Table 6 shows the general form of the compound error codes related to the
TLBs, memory, caches, bus and interconnect logic, and internal timer. These
compound errors also consist of sub-fields that describe the type of access,
level in the cache, and type of request.
10
Debugging Machine Check Exceptions on Embedded IA Platforms
Table 8 shows the 2-bit level (LL) sub-field, which indicates the level in
memory hierarchy where the error occurred.
Table 9 shows the 4-bit request (RRRR) sub-field, which indicates the type of
action associated with the error.
11
Debugging Machine Check Exceptions on Embedded IA Platforms
Refer to Section 15.9 of Ref [1] for the other sub-field decoding tables and for
more information.
Multi-core implications
Most MCE registers are core-specific, that is, each core has its own set of
control, status, and address registers. However, in newer processor families
such as Nehalem, new banks of registers have been added to the architecture
to address package-level error information. For example, in Nehalem
processor families, bank 0, 1, 6, 7 are per-package and introduced to address
QPI, integrated memory and graphics. Banks 2, 3, 4, 5 are more traditional
MCE banks addressing per-core level information such as Data Cache, TLB,
MLC, LLC etc. See Ref [2] for more information.
12
Debugging Machine Check Exceptions on Embedded IA Platforms
13
Debugging Machine Check Exceptions on Embedded IA Platforms
MC1_STATUS: 0xf200000000020151
This can be decoded by looking at the MCA error code field bits [15:0], which
is 0151 of the above register. Convert this value to binary (0000 0001 0101
0001) and refer to Table 6 to determine the compound error code form. In
this case, the form is (000F 0001 RRRR TTLL) and is a cache hierarchy error.
Next, the sub-fields can be determined: TT=00, LL=01, RRRR=0101. By
using the sub-field Tables 7, 8, 9 and the corresponding “interpretation” form
from Table 6, ({TT}CACHE{LL}_{RRRR}_ERR), the MCE is decoded as an L1
instruction fetch error. This error is an uncorrected error as can be seen by bit
61 being set in the MC1_STATUS register.
The messages provided by the MCE error code can be used to understand
what may potentially be causing the errors. Refer to this section for some
potential common causes.
When an MCE can be reproduced on an Intel CRB, there are usually two
possibilities related to the cause. One is a potential sighting of a possible
14
Debugging Machine Check Exceptions on Embedded IA Platforms
Debug Checklist
Related documents
Summary
This application note gives an overview of machine check architecture and its
purpose in detecting and reporting system errors. This architecture provides
an opportunity to capture a group of error situations visible to the CPU at the
point of failure. Newer additions of the MCA also make it possible to wire
15
Debugging Machine Check Exceptions on Embedded IA Platforms
This document provides a quick review of machine check architecture and its
key elements for debugging. It also provides recommendations on how to
debug MCEs in embedded systems and provides a sample approach to help
system developers debug such issues. As each failure event is rather unique,
every error situation will need to be approached differently. Nevertheless, this
step by step guide provides a list of items that may be helpful to this debug
process.
The Intel® Embedded Design Center provides qualified developers with web-
based access to technical resources. Access Intel Confidential design
materials, step-by step guidance, application reference solutions, training,
Intel’s tool loaner program, and connect with an e-help desk and the
embedded community. Design Fast. Design Smart. Get started today.
http://intel.com/embedded/edc.
Authors
Ashley Montgomery is a Platform Application Engineer with
Intel’s Embedded and Communications Group.
Tian Tian is a Platform Application Engineer with Intel’s
Embedded and Communications Group.
16
Debugging Machine Check Exceptions on Embedded IA Platforms
BunnyPeople, Celeron, Celeron Inside, Centrino, Centrino Inside, Core Inside, i960, Intel, the Intel
logo, Intel AppUp, Intel Atom, Intel Atom Inside, Intel Core, Intel Inside, the Intel Inside logo, Intel
NetBurst, Intel NetMerge, Intel NetStructure, Intel SingleDriver, Intel SpeedStep, Intel Sponsors of
Tomorrow., the Intel Sponsors of Tomorrow. logo, Intel StrataFlash, Intel Viiv, Intel vPro, Intel
XScale, InTru, the InTru logo, InTru soundmark, Itanium, Itanium Inside, MCS, MMX, Moblin,
Pentium, Pentium Inside, skoool, the skoool logo, Sound Mark, The Journey Inside, vPro Inside,
VTune, Xeon, and Xeon Inside are trademarks of Intel Corporation in the U.S. and other countries.
17