You are on page 1of 154

1Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

SPARC T5 Servers Deep Dive


Part 2
Insert Presenters Name Here
Insert Presenters Title Here

2Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

0.20

The following is intended to outline our general product direction. It is


intended for information purposes only,
and may not be incorporated into any contract. It is
not a commitment to deliver any material, code, or functionality, and
should not be relied upon in making purchasing decisions.
The development, release, and timing of any features or functionality
described for Oracles products remains at the sole discretion of Oracle.

3Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

ILOM

4Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

Software/Firmware Block Diagram


LDOMS
Manager

Solaris
11U1

Solaris
S10U11

System
Domain

Kernal
FMA
Components
Platform
Drivers

Kernal
FMA
Components
Platform
Drivers

Sun4v

Sun4v

Sun4v

OBP

OBP

OBP

Host Config
Machine
Description
Hypervision
OBP
POST

Kernal
FMA
Components
Platform
Drivers

ILOM

Guest Mgr
FMA Support
Power On/Off
FERG

Hypervisor

POST

Host
Data
Flash

Memory

Environmentals
Fault Management
LED Control
SP Diags
DFRUIDS
Plat HW Svc
FDD (diagnosis)
IPMI
CLIs
Logs
SNMP

UBoot/Diags

IO

OBP NVRAM/POST/SC config vars


ASRDB
SER log
LDOMS config
TOD data
Console Log

Host
5Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Linux Kernel

Host Config
CPU

Host Flash

Confidential Oracle Internal

FPGA
Platform HW

PILOT3 Microprocessor
Service Processor

Integrated T5 Software Stack


Control Domain

Logical Domain

Solaris

PM

Solaris

LDM

Logical Domain

Logical Domain

Solaris

Solaris

Hypervisor
HC
Physical Hardware

GM

DDB

Service Processor

fdd
POD

CAPI

UIs

faultDB

6Copyright 2012, Oracle and/or its affiliates. All rights reserved.

fmadm

Confidential Oracle Internal

Legend
CAPI ILOM Common API
COD Capacity on Demand
DDB Deconfig DB
faultDB Fault DB
fdd Fault Diagnosis Daemon
GM Guest Manager
HC Hostconfig
LDM Logical Domain Manager
PM Power Manager
POD Platform Obfuscation Daemon
UI User Interface

T5 SP

Emulex PILOT3 based SP

Same SP card used on all T5 systems

Same SP card shared with M5 systems.

Monitors voltages used on the SP via ADC in PILOT3

Provides similar/same functionality as previous generations.

7Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

rKVMS (Java Remote Console+)

rKVMS Solution provided by Emulex, optimized for Pilot3

Expanded to support ILOM features

ILOM authentication

Serial Redirection (Host console redirection)

New features:

Auto detected Mouse mode

Virtual keyboard

Storage redirection: SSL and Non-SSL

Take / Relinquish full control

Local monitor on/off

8Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

rKVMS (Java Remote Console+), con't

More key differences

Mouse and keyboard will only show up in the OBP device tree if remote
console is active. Before, devices were always present.
Video cannot be used as a system boot console. OBP rconsole alias
removed.

Front and rear VGA ports are distinct devices

Configurable VGA_REAR_PORT policy under /SP/policy

Default disabled = front port is the port in use

If connecting to the rear VGA port, must change this setting.

9Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

Side-band Management
3 Remote Management Communication Channels
Out-of-band management = communicate with the SP over a dedicated

media (Ethernet/Serial)
In-band management = communicate with the SP through Oracle Solaris

via agents
Side-band management = communicate with the SP over a shared media

(the hosts data network interface)

10Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

T5-2 ILOM Protocol Interfaces

Out of
Band

Serial

CLI
Tools

Side
Band

11Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Side
Band

SNMP
Agent

In Band

Hardware
Management Pack

CLI
Host Console Redirection

Confidential Oracle Internal

Ethernet
HTTPS
RKVMS
SSH (CLI)
SNMP (traps)
IPMI
(syslog, SMTP, misc. IP Services)
Serial over Ethernet

T5-4/T5-8 ILOM Protocol Interfaces


Hardware
Management
Pack

Serial
CLI
Host Console Redirection

In Band

Out of
Band

Side Band

Out of
Band
Ethernet
HTTPS
RKVMS
SSH
(CLI)

12Copyright 2012, Oracle and/or its affiliates. All rights reserved.

SNMP (traps)
IPMI
(syslog, SMTP, misc. IP
Services)
Serial over Ethernet

Confidential Oracle Internal

SNMP
Agent

CLI
Tools

ILOM Interface Capabilities


Serial
CLI
Browser
IPMI
SNMP
rKVMS
Console
Redirection

13Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

ILOM

Sideband

Host OS

Oracle ILOM Key Functions


Management Interfaces
CLI, BUI, IPMI, SNMP

Firmware Updates
Remote Host Management
Inventory and Component Management
System Monitoring and Alert/Fault Management
User Account Management
Power Consumption Management

14Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

Guest Manager aka GM


Resides on the SP
Provides services to Guest domains via Logical Domain Channels

(LDCs)
Communication bridge between Host and ILOM
Provides FERG capabilities
Manages LDOM configurations
Provides development facilities like Configvars, eFuse etc.
Can sequence the CPU in Serial boot mode for debug purposes.

15Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

Hostconfig
Initialization code that runs at power-on on SPARC
Platform specific code that drives initialization and configuration of

CMP, memory and other motherboard components at power-on


Invokes Power-On Self Test (POST) twice (socket, smp) and applies
platform policies to configure system around failed components
Generates Physical Resource Inventory (PRI) based on FRU
information and system configuration

16Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

Hostconfig
Highly parallelized uses multiple strands to speed configuration
Memory configured in parallel
Deconfigures components in the deconfig db (DDB)

17Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

Logs
Console output (including Hostconfig) is captured in the console logs

located on the SP at /persist/host_logs/ with the mapping:


/HOST

hostconsole.log

VBSC/GM console logs are captured at

/coredump/sp_trace/logs/GM.log.#, (log rotated files)

18Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

Logs
@(#)Hostconfig 1.3.x-nightly 2012/10/24 19:11 [t5-8:debug]
2012-10-25 18:44:31 2:0:0> WARNING: TPM hardware is disabled
2012-10-25 18:44:56 3:0:0> NOTICE:

SPARC-T5 Revision 1.0

2012-10-25 18:44:56 0:0:0> NOTICE:

SPARC-T5 Revision 1.0

YYYY-MM-DD HH:MM:SS UTC time

Socket:Core:Strand reporting entity

NOTICE: - general message

WARNING: - significant problem, doesnt inhibit boot

ERROR: - signficant problem, may impact boot

FATAL: - system cannot proceed

DEBUG: - development use only, should never see these

19Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

PCIE fabric and failover


ioreconfigure controls behavior
redundant paths to all IO
only a single path is active at a time
failover results in less available bandwidth per device

20Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

Fault Management
Knowledge Articles in MOS
ILOM fdd Diagnosis
Faults and Alerts
No ALOM Compatibility
ILOM FMA Captive Shell
Sideband Service Processor Network Connection
New ILOM Fault Notification (SNMP Trap)
ASR Support
FMA on M5 ILOM also applies to T5 ILOM, except for M5 specific features

21Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

Fault Management on T5 systems

T5 CPU and Memory faults are now diagnosed by ILOM

Disabled Database (DDB) owned by ILOM

FMA's Fault Proxy is used to keep ILOM's fault manager in sync with Solaris' fault manager.
Both will display the sum of all faults in the system.
Faults can be repaired from either side.
Fault Proxy communicates via the Ethernet Over USB connection.
IO faults are still diagnosed by Solaris.
For faults which diagnose resources as unusable, ILOM will add those resources to the DDB.
Resources excluded on next host reset.
When faults are repaired, ILOM automatically updates the DDB. Bringing components back
online requires a host reset.

Extended SP-POST (Power on Self Test)

Runs at SP boot. Tests devices on the SP FRU and its Ethernet port.
Status stored and converted to ereports after ILOM boots.

22Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

Fault proxy
SP

ereports
hostd
hostd

FETD
FETD
ip-transprt
ip-transprt

LDC
LDC

Control Domain
ETM
ETM

ETM
ETM

faults
TCP/IP
TCP/IP

ereports
LDC
LDC

IO Domain
ETM
ETM

faults
ip-transport
ip-transport

ETM
ETM

LDC
LDC

ETM
ETM

IO ereports are forwarded from the SP to the control domain, and then on to any

relevant IO domain
Faults are proxied between the SP, the control domain and any IO domains to

provide a single view of faults in the system.

Non-servicable faults such as memory faults are not proxied.

The SP and the control domain can view and manage all faults in the system.
An IO domain can only view and manage faults local to the domain.

23Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

Ereport Generation
Three producers of ereports:
Guest Manager (GM)
Error Telemetry Collection Daemon (ETCD)
Platform Obfuscation Daemon (POD)

GM has direct communication with SW running on the host HW


Hostconfig (HC)
POST
Hypervisor (HV)

24Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

HV Reported Errors
Communicates error information in a raw, binary format called a

Service Error Report (SER)


SERs are processed by a library called the FMA Ereport Generator

(FERG)
published to the Event Manager framework

25Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

Platform Obfuscation Daemon (POD)


Runs any POST functionality that does not involve SPARC code
Ereports generated on HW problems
If POD encounters HW problems not accessible to HC, an ereport is

generated

26Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

ASR Support
SPARC T5 servers will be supported by ASR (Automatic Service

Request) at release
Continues use of sunHwTrapFaultDiagnosed SNMP notification
Telemetry for ILOM fdd diagnosis
Supports platform and FRU identity
Supports multi-suspect list

27Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

Service Processor Software (ILOM) on T5/M5


Systems
ILOM looks/behaves just like ILOM on other platforms
Simple (user-visible) set of extensions to support Physical Domains
Extensions to support Service Processor Proxies and redundant

Service Processors
- Minimal impact on user experience

28Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

Boot mode
T-series platforms (except T5-1B) have two boot mode options
Sequenced Boot: SP boots, then user initiates host power-on via ILOM
Parallel Boot: SP and host power on in parallel to reduce overall boot time

Adjustable via ILOM '/SP/policy'


-> show /SP/policy
PARALLEL_BOOT = disabled

29Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

Boot Sequence
Service Processor Boot
Grub starts on poweron
Grub starts Linux
Linux starts various services, starts ILOM
ILOM starts Guest Manager
Services to the Guest OS domains
Communication (via FPGA) bridge between the host and the ILOM service
Provides Fault Error Report Generation (FERG)
Manages LDom configurations in persistent storage
POD performs power sequencing for Host processors

30Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

Boot Sequence (Continued)


Host Boot
GM takes processors out of reset
T5 starts executing Hostconfig code from flash
Hostconfig code does the following:
On each selects a master to coordinate the host configuration, runs

other per-CMP strands in parallel for initialization and configuration


After each overall host config is complete, populates PRI (physical

resource index) and generates MDs for Hypervisor and Guest


Master strand jumps to Hypervisor; others strands are parked

31Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

Boot Sequence (Continued)


Host Boot (continued)
Hypervisor then proceeds to
Copy itself from ROM to RAM
Initializes itself based on HV MD (mapping of phys resources to logical domains)
Starts the guest (OpenBoot is the first guest)
OpenBoot probes I/O devices based on Guest MD and sets up the device tree for

Solaris. Starts Solaris boot.


Boot block (bootblk) is loaded from disk
bootblk reads UFS (or ZFS) file system to find ufsboot
Loads ufsboot (or zfsboot) into low memory and jumps to it
ufsboot locates a kernel (Solaris), loads and jumps to it

32Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

Boot Sequence (Continued)


Host Boot (continued)
Solaris kernel runs and does the following to boot CPU
Loads required drivers
Sets up the VM system
Takes over trap table (ufsboot disappears)
Starts other CPUs
Hand crafts first process init and lets it run on all other CPUs

33Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

SP Highlights
ILOM looks/behaves just like ILOM on other platforms
Simple (user-visible) set of extensions to support Physical Domains
Extensions to support Service Processor Proxies and redundant

Service Processors on M5-32


Minimal impact on user experience

34Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

ILOM on T5/M5 Enterprise Features


ILOM extensions to support Enterprise Features
Some of these are conceptually leveraged from XSCF SW
SP Tracing Facility
Very useful for tracing inter-process activity
Reliable performance (elapsed time) measurements
Allows for tracing interactions between SPs and SPPs etc
Enterprise systems have lower volume, more complex
configurations and high RAS expectations
We cannot expect customers to reproduce bugs
Need to collect as much debug info as possible on live system as it occurs on customer site
Coredump compression and snapshot collection
Unified snapshot from all SPs and SPPs

35Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

ILOM on T5/M5 Enterprise Features


Confstore distributed config database
Exploitable by CMM/Blades
System Identity is maintained across FRU / SP replacement
aka TLI (Top-Level-Identifier)
Flash Images are signed to avoid compromised images
hardened edits of config files
Transaction oriented before or after, no intermediate results
Tunables framework for MAX_USERS etc
Sensor Broadcast (SSBCAST) enhanced to support

36Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

T5-4 / T5-8 Processor Modules

ILOM enforces valid PM / PFM configurations

Processor Modules can be dynamically added to a running system Post-RR, that is...
New form of POST: iPost

T5-8 can be reduced to 6 or 4 processor configurations by replacing certain PMs with


Processor Filler Modules (PFMs).
Similarly, T5-4 can be reduced to a 2 processor config using a PFM.
ILOM will create a configuration fault and refuse power-on. Must correct the configuration,
repair the fault, and try again.

Will test a PM that is powered-on with DR.


ILOM part of iPost tests all i2c devices on the PM.

Processor Module DR will be accomplished through ILOM start / stop commands.

Example: stop /SYS/PM1

37Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

New T5 System Firmware features

System Firmware redundancy

Both firmware's are stored in pairs of flash banks.


Loading firmware is always done into the unused bank(s). When the load is
complete, the system reboots and swaps banks.
Support for a firmware load with the host system powered on; BUT, the banks
cannot be swapped until the host is powered off.
To load into the unused banks:
-> cd /SP/firmware/backupimage/
-> load <URI>
To swap the banks:
-> set /SP/firmware sysfw_bank_switch=true

MUST be followed by an SP reboot to take effect:


-> reset /SP

38Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

T5 Platform hardware management

Ability to update Power Supply Firmware (from service)


DVFS Power Management

IFC Fan Control

ILOM controls fans speeds using the IFC algorithm.


Based on temperature readings across the system (DIMMs, CPUs, etc).

Power-on failure fault diagnosis

Power throttling done in real-time by the FPGA based on power consumption, current draw
(IWARN), and temperature readings.
ILOM out of the loop, other than programming the power thresholds.
For PM power load balancing, ILOM set thresholds each second.

POK signals are expected to assert when power-on is requested.


ILOM can now better diagnose POK assert failure to suspected FRU.

Power Glitch fault diagnosis

Power monitored for glitches in nanosecond resolution by the FPGA.


ILOM creates faults in response to FPGA notification of glitch events.

39Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

New to ILOM 3.1


Simplified Data Model (SDM)
Three-Level Model
Level One: System Summary Info
Level Two: Subsystem Summary
Level Three: Logical Topology

Subsystems: Cooling, Power, CPUs, Memory,


Storage, and Networking
Also Blades, DCUs, CMUs, CPU Modules, I/O Modules on some

platforms
Open Problems unifies fault management with SDM
40Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

ILOM 3.2: New Linux distro and compilers


M5 and T5 will release with ILOM 3.2
Linux version 2.6.27.43, SQUEEZE
was 2.6.16.4, SARGE

gcc version 4.4.5


was 3.3.6

Why:
Old distro no longer supported
Security and bug fixes
Posix threads instead of Linux threads
41Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

CLI Changes from T3/T4 ILOM

Different method for disabling components

Before: component_state property represented both the current state (disabled by POST or
hostconfig) and user-requested state.
After: split states to reduce confusion
current_config_state = actual state of the resource in the system
disable_reason = human readable reason of why it's disabled
requested_config_state = user requested state
As before, must start or reset the host for requested_config_state changes to take effect.
requested_config_state can only enable components disabled by requested_config_state.
All other disabled reasons are faults, which must be addressed (via fmadm acquit) or the
FRU replaced.

A fault in one component may cause other components to be disabled. These will be
noted with disable_reason of Configuration rules.

42Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

Other Changes from T3/T4 ILOM

BBR (Black Box Recorder) data moved

BBR now records ILOM data

ILOM RAM and filesystems available space


For all critical processes: memory usage, #threads, #files open, cpu usage

EP (Electronic Prognostics)

now in /large, a 64MB filesystem

Solaris fetches EP configuration files from ILOM via SNMP

Ability to disable Host console logging

-> set /HOST/console logging=disabled


Disabling the logging also deletes all stored logs.

43Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

SDM Architecture
WEB

CLI

LUMAIN

SDM BACKEND
Platform xml
LIBHDL
Hw service

CAPI

44Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

SSM API

SDM CLI
CLI is reorganized.
/System target(tree) is introduced
Different components of the system are grouped and organized into

sub targets of /System


At every level of the tree, the critical properties are shown along with
any sub targets
The applicable cli commands are supported at different levels of

the /System tree.


All targets and properties under /System are case insensitive
45Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

SDM CLI (cont.)


Health and health details are two of the common properties shown at

every level to indicate the over all health of that sub tree.
Open_Problems target shows the detailed descriptions of the faults in

the system
/SYS and /Storage targets are made legacy
Continue to exist but hidden by default
The legacy targets can be made visible by enabling

/SP/cli/legacy_targets property.

46Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

SDM CLI (Summary level targets)


-> show /System -d targets
/System
Targets:
Open_Problems (0)
Processors
Memory
Power
Cooling
Storage
Networking
PCI_Devices
Firmware
BIOS
47Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

SDM CLI (Summary level properties)


-> show /System -d properties
health = OK
health_details = open_problems_count = 0
type = Rack Mount
model = Exadata X2-3
part_number = 8124854
serial_number = 2229CNL124
component_model = SUN FIRE X4170 M3
component_part_number = 7013743
component_serial_number = 1118CNL013
system_identifier = sysidentifier
system_fw_version = 3.1.0.10
primary_operating_system = Not Available
host_primary_mac_address = 00:21:28:d5:c0:b2
ilom_address = 10.153.55.201
ilom_mac_address = 00:21:28:D5:C0:B6
locator_indicator = Off
power_state = Off
actual_power_consumption = 5 watts
action = (none)
48Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

SDM CLI (Processors Subsystem)


-> show /System/Processors/CPUs/

-> show /System/Processors/

Targets:

Targets:

CPU_0

CPUs

CPU_1

Properties:
health = OK
health_details = architecture = x86 64-bit
summary_description = Two Intel Xeon Processor E5 Series
installed_cpus = 2
max_cpus = 2

49Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

SDM CLI (Processors Subsystem)


-> show /System/Processors/CPUs/CPU_1
Properties:
health = OK
health_details = part_number = 060D
serial_number = Not Available
location = P1 (CPU 1)
model = Genuine Intel(R) CPU @ 1.60GHz
max_clock_speed = 1.600 GHZ
total_cores = 8
enabled_cores = 8

50Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

SDM Web
Difference from ILOM 3.0:
Old components, sensors and indicators pages have been removed and

replaced with new subsystem pages.


New subsystem pages focus less on raw sensor data, more on status of the
components.
Old RAID pages are now replaced by the storage subsystem page.
Session timeout page has been merged into the web server configuration
page.
Open problems page brings together all system problems in a single spot,
replaces old fault management page.
New comprehensive summary page.

51Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

ILOM 3.1 Web redesign


Navigation tree on left replaces old tabs on top.
Top levels of tree include new summary and subsystem pages.
Pages organized by purpose:
System Information
Remote Control
Power, Host, System and ILOM Administration

52Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

New Embedded BUI Mini-Help As Of ILOM 3.2.1

53Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

Web Summary Page


General Information Table: Basic info about Server and SP
Action Table: Quick access to common ILOM actions (Power on/off,

OSA, JRC, Firmware update)


Subsystem status table: Quick summary of the various subsystems.

54Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

55Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

SDM Storage
New location for storage information as the previous Storage Viewer

UI(/STORAGE/raid in CLI, Storage RAID tab in BUI) is now legacy


Differences from legacy Storage Viewer
Contains similar information, but simplified and now reflects health of

all components
Contains non-RAID Controller and Expander information

56Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

SDM Storage Example


-> show /System/Storage/
/System/Storage
Targets:
Disks
Controllers
Volumes
Expanders
Properties:
health = Not Available
health_details = Comprehensive Storage monitoring is not available.
Ensure the host is running with the Hardware
Management Pack. For download details go to
http://www.oracle.com/technetwork/server-storage/server
mgmt/downloads/index.html
installed_disks = 1
max_disks = 8
installed_disk_size = Not Available
logical_volumes = Not Available
disk_controllers = Not Available
Commands:
cd
show

57Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

SDM Storage Example


-> show /System/Storage/Disks/Disk_2
/System/Storage/Disks/Disk_2
Targets:
Properties:
health = Warning
health_details = The disk is offline per host request or other reason
(disk is not compatible for use in volume). Type 'show
/System/Open_Problems' for details.
part_number = ST914602SSUN146G
serial_number = 0998SX3L 3NM8SX3L
location = HDD2 (Disk 2)
type = HDD
manufacturer = SEAGATE
capacity = 136 GB
device_name = /dev/sdc
raid_disk = false
wwn = 0x5000c500130310a3
Commands:
cd
show

58Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

SDM Storage Example


-> show /System/Storage/Controllers/Controller_0/
/System/Storage/Controllers/Controller_0
Targets:
Properties:
health = OK
health_details = serial_number = 500605b001090990
type = SAS
manufacturer = LSI Logic
model = SG-XPCIE8SAS-E-Z
Commands:
cd
show

59Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

SDM PCI Devices


Provides on-board and add-on PCI Device information
On-board device description is provided
Add-on information is new to ILOM UIs and includes device description, part

number, and PCI IDs


This information is provided to ILOM per slot by BIOS originally for Fan

Control. The PCI IDs provided allow for the part number and device
description to be known
Add-on components are based on PCIE slots on rackmounts and PEM,

REM, and FEM slots on blades


Add-ons not supported by a platform are present but the description/PN are

Not Recognized
60Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

SDM PCI Devices Example


>show/System/PCI_Devices/Onboard/Device_0
/System/PCI_Devices/Onboard/Device_0
Targets:
Properties:
description=NET0IntelX540GigabitEthernetController
Commands:
cd
show

61Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

SDM PCI Devices Example


-> show /System/PCI_Devices/Add-on/Device_4/
-> show /System/PCI_Devices/Add-on/Device_4/
/System/PCI_Devices/Add-on/Device_4
/System/PCI_Devices/Add-on/Device_4
Targets:
Targets:
Properties:
Properties:
part_number = Not Recognized
part_number = SGX-SAS6-R-INT-Z
description = Not Recognized
description = Sun Storage 6 Gb SAS PCIe RAID HBA, Internal
location = PCIE4 (PCIe Slot 4)
location = PCIE4 (PCIe Slot 4)
pci_vendor_id = 0x8186
pci_vendor_id = 0x1000
pci_device_id = 0x105f
pci_device_id = 0x0079
pci_subvendor_id = 0x108e
pci_subvendor_id = 0x1000
pci_subdevice_id = 0x115f
pci_subdevice_id = 0x9263
Commands:
Commands:
cd
cd
show
show

62Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

SDM Open Problems


Provides access to fault information in all levels of SDM UI, system,

subsystem, component
Open Problems UI contains the most in depth information including

time, subsystem, location, UUID, description, P/N, S/N, and Knowledge


Article URL
Data retrieved from fault management (similar data to /SP/faultmgmt
shell fmadm faulty command) and for storage specific faults from
HMP

63Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

SDM Open Problem Example


-> show /System/Open_Problems
Open Problems (1)
Date/Time Subsystems Component
------------------------ ------------------ -----------Fri Jan 20 16:16:48 2012 Memory P0/D8 (CPU 0 DIMM 8)
A memory uncorrectable ECC fault on a DIMM has occurred. (Probability: 100
UUID: 7df1056b-e208-c9fd-c1bd-e9127d6c05d2, art Number: 001-0003,
Serial Number: 00CE021038834C9C59, Reference Document: http://www.sun.com
/msg/SPX86-8001-U5)
-> show /System/ health health_details
/System
Properties:
health = Service Required
health_details = P0/D8 (CPU 0 DIMM 8) is faulty.
Type 'show /System/Open_Problems' for details.

64Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

SDM Open Problem Example


-> show /System/Memory/ health health_details
/System/Memory
Properties:
health = Service Required
health_details = P0/D8 (CPU 0 DIMM 8) is faulty. Type 'show
/System/Open_Problems' for details.
-> show /System/Memory/DIMMs/DIMM_8/ health health_details
/System/Memory/DIMMs/DIMM_8
Properties:
health = Service Required
health_details = A memory uncorrectable ECC fault on a DIMM has
occurred. Type 'show /System/Open_Problems' for
details.

65Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

66Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

67Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

SPARC Virtualization
Technologies for The T5

68Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

Oracle Solaris and SPARC Virtualization

Better Resource Utilization for a More Efficient Datacenter

App

Domain B

App

Domain B
DW DB

App
Domain C
Web
Web
Web

M-Series
69Copyright 2012, Oracle and/or its affiliates. All rights reserved.

T-Series, M5
Confidential Oracle Internal

App Web

Oracle Solaris

Oracle Solaris Zone

OLTP DB

App

DB

Oracle Solaris Zone

OLTP DB

Web

Oracle Solaris 9 Zone

Domain A

Domain A

Oracle Solaris
Zones

Oracle VM Server
for SPARC

Oracle Solaris 8 Zone

Dynamic Domains

Virtualization on T5 Systems
High degree of virtualization

OVM for SPARC (i.e. Logical Domains) for hypervisor-based


virtualization

Oracle Solaris Zones for OS virtualization

Oracle Enterprise Manager Ops Center provides an administrator-

friendly integration of these different virtualization levels

70Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

21st Century Cloud Infrastructure

Solaris 11
Zone

Solaris 11
Zone

Solaris 10
Zone

Solaris
Legacy Zone

Oracle Solaris 11

Solaris
Legacy Zone

Oracle Solaris 10

Oracle VM Server for SPARC


SPARC

71Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

Solaris 10
Zone

Oracle Solaris Zones

Built-in Virtualization on Any Oracle Solaris System


Same virtualization technology for all SPARC, x86 systems
Simple; lowest overhead; highest performance
- Ideally suited to leverage multithreading hardware

Mission-critical deployments
- Largest Sun financial and Telco customers
all run Oracle Solaris Zones
- In production on 25+% of installed Oracle Solaris systems
Ideal for a variety of scenarios
-

Lightweight test environments


Dynamic environments with resource sharing
Rapid prototyping test beds on same hardware and OS
Zones cloning/migration/instant restart

72Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

Built-in Virtualization
Oracle Solaris 11 Zones
Secure, light-weight virtualization
Scales to 100s of zones/ node
Delegated administration
ZFS datasets, boot environments
Observability via zonestat
Solaris 10 Zones
NFS Server
Network stack isolation and

resource management
Co-engineered with installation, security, ZFS, networking, IPS, SPARC and x86 hypervisors

73Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

Cloud-Scale Networking

Parallel networking stack. Built to scale.

Virtualize, consolidate network infrastructure

Hardware assisted Network Resource Management

Increase performance and reduce costs

Optimized for performance at every level

Secure Isolation

Ease of Use

Routing, Firewalling, Load Balancing, Bridging, High

Automatic Networking mode

Availability

Fine grained observability


VLAN isolation, dynamic VLAN provisioning
74Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Integrated functionality

Confidential Oracle Internal

Oracle VM Server for SPARC

The Virtualization Platform combining the best of Oracle Solaris and


SPARC for Your Enterprise Server Workloads
foorr
mized
i
t
p
O
O
&
C&
RC
PAR
SP
l
o aariis
S
S
e
l
e
c
a
c
Or a

Isolated OS and
applications in each
logical (or virtual)
domain

Oracle Solaris 10

Oracle Solaris 10

Oracle Solaris 10

Oracle Solaris 10

GP Domain

GP Domain

GP Domain

GP Domain

Oracle Solaris 11

Oracle Solaris 11

Oracle Solaris 11

Oracle Solaris 11

Database Domain

Database Domain

Database Domain

Database Domain

Firmware-based hypervisor

SPARC Hypervisor

Each logical domain runs in


dedicated CPU thread(s)

T5 Server

75Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

Alignment with SPARC designed for Threads

Traditional VM based on assumption CPUs are scarce, so we must


over-commit and time-slice them
Overhead for time-slicing different contexts
Intercept for privileged operations
Latency servicing every interrupt

T5/M5 systems are thread-rich - so we can dedicate CPU threads to


each domain for native CPU performance
Eliminates CPU latency and overhead
Context switches in a single clock on cache miss or interval

Some VM systems also over-commit RAM,


Causes overhead and requires complex memory management
Ok for lightweight, occasional workloads, very bad for enterprise apps
76Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

Hypervisor Support
Hypervisor software/firmware responsible for maintaining separation

(eg: visible hardware parts) between domains


Using extensions built into sun4v CPU
Resides in the firmware, not the ILOM

Provides Logical Domain Channels (LDCs) so domains can

communicate with each other


Mechanism by which domains can provide services to each other
A protocol lets hypervisor and domains queue and dequeue service request

messages

Service domains use these channels and owns I/O resources for

bridged access
77Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

Roles of Domains
Control domain
Creates and manages other logical domains and services
Control domain usually also a service and I/O domain

I/O domains
own physical I/O bus or devices. May run apps using physical I/O for native

performance

Service domains
provide virtual network and disk devices. Typically an I/O domain

Guest domain:
run applications on virtual I/O devices provided by service domain
78Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

Domain Components
Control & Service Domain

Guest
vds

vswitch

vdisk

LDC

PCIe

79Copyright 2012, Oracle and/or its affiliates. All rights reserved.

vnet

Hypervisor

Confidential Oracle Internal

Hypervisor Basics
What is hypervisor?
Primary roles:
Implements software component of sun4v virtual machine, providing low

overhead hardware abstraction


Enforces hardware and software resource access restrictions for guest, including

inter-LDom communication, to provide isolation and security


Performs initial triage and correction of hardware errors
Secondary roles:
Implements dynamic LDom reconfiguration
Provides data for performance statistics
Manages hardware elements of some power management features
80Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

Hypervisor Basics
What hypervisor isn't:
An Operating System
HV does not time slice between guests running on strands and has a a fixed memory
footprint (no malloc)
HV only executes in response to a specific subset of traps. Except where hardware access
is involved, traps go directly to the guest for maximum performance. There are separate HV
and guest traptables for this reason.
A policy maker
HV enforces boundaries, but does not define them
HV will do as requested, even if it may harm the guest, as long as it does not violate
resource access restrictions
The IO manager
Drivers in the guest manage the PCIE fabric and devices
HV enforces access restrictions to IO resources

81Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

Hypervisor and Logical Domains


Oracle VM for SPARC

A logical domain is a virtual machine comprised of a discrete logical

grouping of resources
Each Guest runs its own instance of Solaris
Each Guest can be created, destroyed, reconfigured, and rebooted
independently
The hypervisor enforces the partitioning of the server's resources, and
the OS and applications running in those partitions (i.e. Guest )
The hypervisor allocates a subset of the overall CPU, memory, and I/O
resources of a server to a given logical domain
Up to 128 guests per hypervisor
82Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

Typical Simple Configuration


One control, service, I/O domain: the primary domain
Services in the primary domain:
A virtual switch (vsw) associated with the primary NIC
A virtual disk service (vds) exporting vdisk for all guests
A virtual console concentrator (vcc)

Other domains are guests with vnets and vdisks

serviced by the primary domain


Guests consoles are available through the primary domain
Primary console is available through the SP
Life cycle of a domain: define it, bind resources to it, start it

83Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

What's new in 3.0 / 3.1


SR-IOV and DIO for non-primary Root Domains (3.1)
Dynamic SR-IOV, DIO and PCIe busses (3.1)
Support for Oracle VM Manager
Live Migration in Elastic Mode
DRM in Elastic Mode
Board DR for T5-4 & T5-8 (3.1)
Preserve Whole Core Constraint across Live Migration
vNICs on vNet (Zones in LDoms) (3.1)

84Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

T5 Hardware features relevant to LDoms


T5-2 will allow 2 independent IO-Domains
disks on pci_1 and pci_2
network on pci_0 and pci_3

T5-4 and T5-8 will allow CPU Board-DR


unclear if at release or later

Two PCIe root complexes per socket


more granular Root Domains

Better power management

85Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

Example: IO Granularity on T5-2


NAME
NAME
------pci_1
pci_1
pci_0
pci_0
pci_3
pci_3
pci_2
pci_2
/SYS/MB/PCIE5
/SYS/MB/PCIE5
/SYS/MB/PCIE7
/SYS/MB/PCIE7
/SYS/MB/SASHBA1
/SYS/MB/SASHBA1
/SYS/MB/PCIE1
/SYS/MB/PCIE1
/SYS/MB/PCIE3
/SYS/MB/PCIE3
/SYS/MB/NET0
/SYS/MB/NET0
/SYS/MB/PCIE6
/SYS/MB/PCIE6
/SYS/MB/PCIE8
/SYS/MB/PCIE8
/SYS/MB/NET2
/SYS/MB/NET2
/SYS/MB/PCIE2
/SYS/MB/PCIE2
/SYS/MB/PCIE4
/SYS/MB/PCIE4
/SYS/MB/SASHBA0
/SYS/MB/SASHBA0
/SYS/MB/NET0/IOVNET.PF0
/SYS/MB/NET0/IOVNET.PF0
/SYS/MB/NET0/IOVNET.PF1
/SYS/MB/NET0/IOVNET.PF1
/SYS/MB/NET2/IOVNET.PF0
/SYS/MB/NET2/IOVNET.PF0
/SYS/MB/NET2/IOVNET.PF1
/SYS/MB/NET2/IOVNET.PF1

86Copyright 2012, Oracle and/or its affiliates. All rights reserved.

TYPE
TYPE
------BUS
BUS
BUS
BUS
BUS
BUS
BUS
BUS
PCIE
PCIE
PCIE
PCIE
PCIE
PCIE
PCIE
PCIE
PCIE
PCIE
PCIE
PCIE
PCIE
PCIE
PCIE
PCIE
PCIE
PCIE
PCIE
PCIE
PCIE
PCIE
PCIE
PFPCIE
PFPF
PFPF
PFPF
PF

BUS
BUS
----pci_1
pci_1
pci_0
pci_0
pci_3
pci_3
pci_2
pci_2
pci_1
pci_1
pci_1
pci_1
pci_1
pci_1
pci_0
pci_0
pci_0
pci_0
pci_0
pci_0
pci_3
pci_3
pci_3
pci_3
pci_3
pci_3
pci_2
pci_2
pci_2
pci_2
pci_2
pci_2
pci_0
pci_0
pci_0
pci_0
pci_3
pci_3
pci_3
pci_3

Confidential Oracle Internal

DOMAIN
DOMAIN
----------primary
primary
primary
primary
primary
primary
primary
primary
primary
primary
primary
primary
primary
primary
primary
primary
primary
primary
primary
primary
primary
primary
primary
primary
primary
primary
primary
primary
primary
primary
primary
primary
primary
primary
primary
primary
primary
primary
primary
primary

STATUS
STATUS
-----------

EMP
EMP
EMP
EMP
OCC
OCC
EMP
EMP
EMP
EMP
OCC
OCC
EMP
EMP
EMP
EMP
OCC
OCC
EMP
EMP
EMP
EMP
OCC
OCC

Non-Primary Root domain example config


Root Domain2
App

Guest Domain

Guest Domain

App

App

App

kernel

App
App

kernel
Solaris
I/O stack

PF
PF
Device
Device driver
driver

Root Domain1

kernel

kernel
Multi
Multi
Pathing
Pathing

VF
VF

Solaris
Solaris
I/O
I/O stack
stack

Multi
Multi
Pathing
Pathing
VF
VF

VF
VF

App

App

PF
PF
Device
Device driver
driver

VF
VF

Hypervisor
pci@500
pci@500

pci@500

PCIe
PCIe
switch
switch

Virtual
PCIe
switch

pci@400
Virtual
PCIe
switch

pci@500
Virtual
PCIe
switch

pci@400

pci@400
pci@400

Virtual
PCIe
switch

PCIe
PCIe
switch
switch

PF

PF

VFs

87Copyright 2012, Oracle and/or its affiliates. All rights reserved.

VFs

Confidential Oracle Internal

Resource Management Improvements


Dynamic resource management (DRM) between domains
- Dynamic CPU movement is based on the priority property of each domain's

DRM policy.
- Ensures that domains running the most important workloads get priority for

CPU access over domains with less critical workloads


- Gives last remaining CPUs to the higher priority domain
- Remove CPUs from a lower priority domain

88Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

Logical Domain Channels (LDC)


Logical Domain Channels
provide low level data services between components
implemented in sram
point-to-point
LDM to hv
LDM to sp
sp to Solaris
allow passage of large chunks of data more efficiently than mailboxes

89Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

PCI-SIG Single-Root IOV


Standardize way of bypassing the

VMM's involvement in data movement


by providing independent memory
space, interrupts and DMA Steams for
each VM
Benefits
Native IO Performance

APP

APP

APP

APP

APP

Guest OS2

Virtual NIC

Virtual NIC

Virtual NIC

VMM

Physical NIC

Hard to live migrate VM, somewhat

similar to Direct assignment

Confidential Oracle Internal

APP

Guest OS1

Intel VT-d

Drawbacks

APP

Guest OS0

Intel VT-x

Provides scalability

90Copyright 2012, Oracle and/or its affiliates. All rights reserved.

APP

APP

SR-IOV
IOV for PCI Express (PCIe) HW
An IOV solution that allows direct access to

PCI Express devices at Virtual Function


(VF) granularity from a Guest Domain
Standard for PCIe Fabric with a Single

Root-Complex (SR-IOV). Standard for PCIe


fabric with multiple Root-Complexes (MRIOV)
Features

APP

APP

VM Device
Config Space

Guest OS0

Fn0

DMA
Usage model
Individual NIC port belong to different OSes
Multiple Guests share SR-IOV devices

Confidential Oracle Internal

System Device
Config Space
PFn0

Virtual NIC

VFn0

VMM
Intel VT-x

Direct access to VF registers, interrupt,

91Copyright 2012, Oracle and/or its affiliates. All rights reserved.

APP

Intel VT-d

Physical NIC

VFn1
VFn2

IOV Benefits

Performance
- Fully utilize IO device resources such as 10G NIC bandwidth
- Low latency

Cost reduction
- Capital and Operational Expenditure savings from Power savings, reduced

adapter count, less cabling and fewer switch ports


But Migration is disabled once VFs are assigned to a domain.
- This may change in the future.

92Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

A High-Level View of VFs


I/O Domain 0

I/O Domain 1

I/O Domain 2

I/O Domain 3

Primary

Operating
Systems

Operating
Systems

Operating
Systems

Operating
Systems

Operating
Systems
Hypervisor

VF0

pci_0

pci_0

pci_0

pci_0

pci_0

PCIe Switch
(virtualized)

PCIe Switch
(virtualized)

PCIe Switch
(virtualized)

PCIe Switch
(virtualized)

PCIe
Switch

VF1

VF2

VF3

VFs

SR-IOV Card

93Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

Secure Live Migration

Eliminates Application Downtime


Live migration available on SPARC

systems

SPARC M5
SPARC T5
SPARC T4
SPARC T3
UltraSPARC T2 Plus
UltraSPARC T2

VM

Secure Live Migration (SSL)

Oracle VM Server Pool

SPARC T-Series servers

On-chip crypto accelerators deliver

secure, wire speed encryption for live


migration
No additional hardware required
Eliminates requirement for dedicated

network

More secure, more flexible


94Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

VM

External Shared Storage

VM

Cross CPU Migration - Architecture


Allows migration of domains across sun4v architecture platforms
Supports migration among platforms
Will be extended to support new platforms as they are introduced

new platforms might not be migration-compatible with all previous platforms

Allows migration among same CPU architecture with different system clocks

frequencies
Dependent on guest domain having Solaris 11
Solaris introduces a generic sun4v CPU module, simulated 1GHz system
clock if HW not available, other changes.
LDoms Manager introduces domain cpu-arch property

95Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

95

Cross CPU Migration - Solaris


Supported in guests running
Solaris 11 U1
Solaris 10 U11

Introduces new generic CPU module: sun4v-cpu


Domain service extension to identify CPU module capabilities
CPU Module has a major/minor version number, used by domain manager to

determine capabilities of the guest


Simulates 1GHz system clock if needed
Kernel routines for read tick/stick modified to emulate clock rate: emulate 1GHz in

generic mode; emulate boot frequency after migration in native mode to system with
different clock frequency
96Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

96

Cross CPU Migration Generic Domains


LDom Manager/Firmware/Solaris must be of sufficient revision to

support Cross CPU Migration


Firmware must support LDom Live Migration on both source and target

domains
Guest domain must be Solaris 11 FCS or newer
Migration is for the most part unchanged
At the start of the migration, domain capabilities and generic CPU

module version are retrieved and sent to the target


Check on target ensures that the target processor supports the generic

CPU module version


97Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

97

Live Migration Improvements


Previously, domains lost the whole-core constraint when migrated
No more hard partitioning => violation of license capping rules

With 3.0, whole-core constraint is preserved


Allows migration of hard partitioning domains

Memory-DR after migration is now enabled


requires Solaris 11.1 or Solaris 10 update 11

Live Migration in Elastic Mode


Live Migration without password
required for OVM Manager
Needs to be enabled per system
98Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

Solaris Support

99Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

Solaris Support for T5 Platform


Solaris Plan of Record for T5 RR:
S11.1 + SRU3 (pre-installed)
S10U11
LDoms 3.0

Getting Solaris release information (what version is installed) :


Use pkg list kernel & pkg list entire

For RR, will qualify and support S10U9 and S10U10 + patch bundle in

a guest domain

100Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

Platform Solaris Overview of changes


T5 performance counter additions/modifications
"busstat -l" displays the number of drams on the system. The "busstat -e

dram" displays list of events for dram counters


There are 4 cpu performance counters to count the events listed in

"cpustat -h" output.


3072 NCPU sun4v support
Enhanced sun4v kernel to support MPO for large config

101Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

Platform Solaris Overview of changes


Enhanced sun4v kernel to support suspending & resuming entire

running OS instance (including I/O) to support DR


Made sun4v kernel resilient to allow/continue booting if it couldnt start

a cpu and use remaining cpus

102Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

Power Management

103Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

Power Management Features


Feature

T4

T5

M5

Dynamic Voltage & Frequency


Scaling
(DVFS)

No

New to SPARC, already exists on x86

Cycle Skipping

T4 whole socket granularity


T5/M5 sub socket granularity

Coherency Link Scaling

No

No

T5-2 only

Gold+

A261A

A254

Power Supplies

IFS (Intelligent Fan Control)

Comments

T5 PS (A261): Goal Platinum


M5/M6 (A254): 3 Phase goal similar to
platinum
(Technically not CPU feature)

Existing M Series (M3000-M9000) have no active power management features


104Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

Coherency Link Scaling


Exclusive to T5-2 ONLY

T5-2 has 4 Coherency Links connecting the sockets


In Elastic Mode only, up to 2 of these can be turned off
Savings up to 16W

Notes:
Other T5 servers have do not have sufficient links to turn off
Cannot be used in Performance Mode

105Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

Power Management Interfaces

Rich Choice of Management Options

Solaris

11.1 poweradm
10 & 11: pwconfig &
/etc/power.conf

ILOM

106Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

ssh CLI
HTTP

Software Component Interaction


Guest (Solaris)
Power
Aware
Dispatcher

PM policy
mempm

cstates
pstates

DVFS
cycle skip

Control Domain/LDoms Manager


Affinity
Engine
DVFS
cycle skip
CPU
Power cap

DVFS
cycle skip
Coherency Link
Scaling

HV

PM

Coh-link scaling,
Coordinate PAD

PPFE

CPU

MCU
BoB Link

Initialize

BoB

Initialize

channel

DIMM
PM
policy

System domain
policy

PM
Policy

PM
capper
System & HW
domain cap

107Copyright 2012, Oracle and/or its affiliates. All rights reserved.

PRI
SP

Power
Capper

Confidential Oracle Internal

Platform MD

Initialize

Coherency
Link

Initialize

hostconfig

FPGA

ILOM 3.2 Power Policies


3 Policies:
Disabled: all components run at full speed (old performance policy)
Performance (default): unused components power managed
unused components are power managed
power savings features with insignificant performance impact are

enabled
Elastic: unused or idle components power managed
CPUs, cores, memory, (coherency links - T5-2 only)
Prior versions of ILOM < 3.2 had 2 Policies
Performance (equivalent of new disabled policy), and Elastic

108Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

System (PDom) Power Management Policies


The PM policy is managed in ILOM under the /SP/powermgmt target.

There are several ways to view or change the policy (browser,


command line, Ops Center).
Ops Center can only set to performance or elastic
ILOM Command Line (login as root)
-> show /SP/powermgmt policy
-> set /SP/powermgmt policy=elastic
-> set /SP/powermgmt policy=performance
-> set /SP/powermgmt policy=disabled
109Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

System (PDom) Power Capping


ILOM 3.2 CLI

Show the power cap settings:


>show/SP/powermgmt/budget

Showthe current power consumption:


>show/SP/powermgmtactual_power

Configure pending power limit in watts (replacing 400 with a value that

is appropriate for your environment), use:

>set/SP/powermgmt/budgetpendingpowerlimit=400

To apply the pending values, use:


>set/SP/powermgmt/budgetcommitpending=true

To enable the configured power limit, use :


>set/SP/powermgmt/budgetactivation_state=enabled
110Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

System (PDom) Power Capping in T5 and M5


Soft Cap:
limits average power

consumption
Hard Cap:
T5: stays within blade

system or PDU power


constraints
M5: physical domain boot

and board add prevented if


hard cap would be exceeded

111Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

Integrated System (PDom) & OS PM Policy


System policy applied to all guest OSs/LDoms (default)
Administrator can override on local S11u1 OS via Solaris poweradm

command
Impacts that LDom/Guest exclusively
Impacts shared resources such as memory

Administrator can only set System Policy via ILOM

112Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

Solaris 11.1 Power Management


Power Aware Dispatcher (PAD) for SPARC
Already available in Solaris 10 for x86
More adaptive scheduling, better performance and efficiency
Applies DVFS to idle processors
Applies cycle skipping to idle cores
Enabled by default
Performance (default) or Elastic policies enable Power Aware Dispatcher.

113Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

Solaris 11.1 Power Management


poweradm command

poweradm(1M) replaces pmconfig and /etc/power.conf


Store configuration data in SMF, instead of configuration files
Security implemented using RBAC
new properties:
time-to-full-capacity, time-to-minimum-responsiveness, suspend-enable
administrative-authority
smf: Solaris instance has control
platform (default): the platform has control (eg LDOMs)
none: power management is disabled

114Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

Power Management Observability


ILOM
Power consumption history and graphs
Breakdown by physical domain (M5), component type

LDoms
Per guest CPU power consumption based on CPU utilization
Per guest memory power consumption based on memory allocation

Solaris
Powertop pstate and cstate residency

115Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

Hypervisor Runtime Execution


Hypervisor arbitrates DVFS and cycle skip requests:
Power Aware Dispatcher (PAD) guests may request different virtual pstates
LDoms Manager monitors CPU utilization of non-PAD guests and requests

DVFS and cycle skip adjustments


HV resolves all requests to create the HW DVFS pstates and cycleskip

ratios

Hypervisor monitors and adjusts per-resource power levels:


Coherency link scaling based on coherency link traffic

116Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

Logical Domains
Observability

117Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

Ops Center Power Management Features


Management
Set policy 1:1
Set policy on group of servers

Monitoring
Current consumption
History
Graphing
Average by group of servers

118Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

Ops Center Responsible Energy Policy


$KwH, WATT, CPU
In/Out Pull, Temperature
See Real Cost in Real Time
Relationship between

utilization and energy


consumption
Enforce Energy Policies Across

Servers

119Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

Ops Center Rack Level Energy Reports


Reported from the

PDU, custom grouping,


and server levels
Top and Bottom
Consumers
Turn off Groups of
Servers

120Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

Power Management Advances


Hardware saves power below 100%

utilization with:
Chip wide DVFS
Per core pair cycle skipping
SerDes power scaling
DIMM off-lining w/ Dynamic Reconfiguration
DRAM PPSE and PPFE support
PCI Express Power Management
Clock Gating

When peak performance is demanded


Power Management Controller achieves
maximum frequency within customer imposed
power and thermal limits
121Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

Power Management Controller: Elastic Savings


Power vs Frequency with DVFS
Hardware saves power below

100% utilization

f(x)=x2.82

Chip wide DVFS

Per core pair cycle skipping

Software monitors frequency

needs of all cores

Puts chip at DVFS point satisfying all


cores requirements

Puts core pairs at lowest cycle skip


ratio satisfying 2 cores in the pair

122Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

Coherency Link Power Savings


Link scaling (4,3,2,1 dynamically as needed)
Hardware monitors link utilization
Software sets entry exit policy (thresholds and dwell times)

M5/
T5

M5/
T5

M5/
T5

4 links

M5/
T5

M5/
T5

M5/
T5
4 links

1 link

M5/
T5

M5/
T5

2 links

123Copyright 2012, Oracle and/or its affiliates. All rights reserved.

M5/
T5
3 links

Confidential Oracle Internal

M5/
T5

M5/
T5

M5/
T5
2 links

25W
Savings

Peak Performance Thermal Management


alarms

T5M

IWARN

4 thermal diodes per chip


centered in core quads

VID
throttle /
resume

VID

VRM

Temp Sensor

If any T > high-water mark Drop


PMC

PLL

T5 CPU

VDD

124Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

Freq, V
If all T < low-water mark Raise
Freq, V

Peak Performance Current Management


Drop F,V if any current >

high-water mark
Raise F,V if any current <
low-water mark
Controls currents for CPU

VDD plus motherboard and


DIMMs

125Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

VID

T5M

IWARN

VID

VRM

throttle /
resume

PMC

PLL

T5 CPU

DVFS Functionality
M5/T5 CPU supports operation at

multiple voltage/frequency pairs called


pstates
High performance (frequency) states

require high voltage => higher power


states
DVFS allows seamless dynamic

switching across pstates


DVFS engine responds to

throttle/resume pin toggled by FPGA in


response to over-temp/over-current
scenarios

126Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

Current High

12V Current Water Exceeded


Sensor
(High Water)

12V Current
Sensor
(Low Water)

Current Low
Water Exceeded

Throttle

M5/T5 Chip

Resume

Thermal/Power
Contro FPGA

Thermal Diodes

Temp.

Thermal
Sensors

DVFS
Dynamic Voltage Frequency Scaling
Not all chips are running at the same voltage/frequency
Up to 32 P-states defined in efuse per chip

127Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

DVFS
T5/M5 supports HW enabled mode (vs disabled)
SW programs T5 P-state tables from Efuse and sets limits
FPGA pulses Throttle pin in response to thermal sensor alarm
FPGA pulses Resume pin in response to thermal sensor note

Throttle/Resume inform the PMC when to migrate P-states (max delta

200MHz, 6.25mV)
Throttle instructs Power Mgmt Controller (PMC) to increase P-state by
1 to reduce power use
Resume instructs PMC to decrease P-state to lowest allowed
T5 transitions up/down P-state table under FPGA control
128Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

Efuse
Efuse per chip data set in fab
Used to control what components may be enabled in a processor node

(L3 Banks, CPU cores, Serial number)

129Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

Open Boot

130Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

OpenBoot - Introduction
OpenBootTM is Oracles trademark for Boot Firmware based on the open

standard IEEE-1275 for Open Firmware


Resident in System Flash (along with Host-Config, Hypervisor, and POST).
System independent initialization and boot code
Consumes Guest Machine descriptor which defines the HW configuration for

the guest
Initialize IO devices and option cards
Builds HW configuration in a device tree format for OS clients
Boot OS from disk or network
Provide boot time services to OS
131Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

OpenBoot Introduction (continued)


OpenBoot is started by Hypervisor as guest, one instance per guest
OpenBoot services retired during OS boot, after control is transferred

to the OS
OpenBoot binary name openboot.bin
OpenBoot Component version 4.35.x
OpenBoot Binary is common across all M5/T5 platforms, released as

part of SysFW packages (SysFW packages are platform specific)

132Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

SPARC T5 Software Stack


User Apps
SunVTS

User Apps
SunVTS
Solaris 11

Solaris 10
- Kernel/Drivers
- FMA Agent

- Kernel/Drivers

OpenBoot

OpenBoot

Guest
Domains

- FMA Agent

Hypervisor

Guest
Manager

Sun4v
API

POST

ILOM / Linux

HostConfig
SPARC T5 CPU
Memory

IO

Host Hardware

133Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

FPGA

SP CPU

Host-Config
Hypervisor
OpenBoot
POST

Linux
ILOM
Guest Manager
SP Hardware

Platform Management

134Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

Platform Management
SNMP
Oracle Enterprise Manager Ops Center
Sun Cluster 3.2 and Sun Cluster 4.0 are supported

135Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

SNMP Requirements
The SNMP Agent, based on open source Net-SNMP, will run on the SP and

export all platform/chassis information relevant for monitoring at the system,


component, and domain levels
The Agent will export read-only MIB information and SNMP traps/notifications

from MIBs, including the Platform MIB and Sun Fault Management MIB, to all
interested third party managers
The Agent will run SNMPv1, SNMPv2, and SNMPv3. SNMPv3 can be utilized

to ensure secure SNMP communication through the authentication, privacy,


and access control mechanisms USM and VACM

136Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

SNMP Requirements (cont.)


By default, the SNMP Agent will not be enabled
By default, the SNMP Agent will have no configuration. When enabled, it run

on port 161 with no version support until the configuration is updated


The Agent will only be enabled on the active SP in a dual SP configuration
All configuration will be handled by specific CLIs to ensure proper SP

authorization and auditing. These CLIs will ensure configuration information


persistence to support fail-over

137Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

SNMP Monitoring
Comprehensive SNMP Support (V1, V2c, V3)
Out of Band via ILOM
Standard MIBs

Oracle MIBs

RFC1213-MIB
SNMP-FRAMEWORK-MIB
SNMP-MPD-MIB

SUN-SUN-HW-CTRL-MIB
SUN-ILOM-CONTROL-MIB

SNMP-Control-MIB

SUN-PLATFORM-MIB

ENTITY-MIB
SNMP-USER-BASED-SMMIB

138Copyright 2012, Oracle and/or its affiliates. All rights reserved.

SUN-HW-TRAP-MIB

Confidential Oracle Internal

SUN-HW-MONITORING-MIB

MIBs for the T5


The SP supports 2 MIBs for SNMP:
SP-MIB (ILOM extension MIB) - This is used to get information on the status and
configuration of the platform. If there is a fault, it send a trap with the basic fault
information.
FM-MIB (Fault Management MIB) - This is used only when there is a fault. It
send the fault trap, but includes all the same detailed information as the FMA MIB in
a Solaris domain. The information has the information need by the service
technician when placing a service call. Also useful if the domain crashed due to a
part failure.

139Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

MIBs for the T5 (cont.)


There are two methods of FMA reporting on the SP:
via SNMP
through the internal network to the affected domain.
To have the SP report all platform faults via SNMP using FMA descriptors, you

should enable SNMP on the SP.


When the command "setsnmp enable" is run, both MIBs are enabled.
For SNMP fault reporting, here are the choices:
setsnmp enable SP_MIB (Just send SP traps)
setsnmp enable FM_MIB (Just send FMA traps)
setsnmp enable (Send both)

140Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

Oracle MIBs
SUN-HW-TRAP-MIB

Describes hardware related notifications/traps


SUN-SUN-HW-CTRL-MIB
Provides platform control via ILOM
SUN-ILOM-CONTROL-MIB
Controls ILOM devices
SUN-PLATFORM-MIB
Oracle specific extension to entity MIB
SUN-HW-MONITORING-MIB
Provides inventory, status, version, power consumption
141Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

Introducing: Oracle Enterprise Manager 11g Ops Center


Industrys First Converged Hardware Management Solution

Integrated Infrastructure
Management
+
Integrated Application-toDisk Management
+
Integrated Lifecycle
Management
+
Integrated Systems
Management & Support
142Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

Manage Your Infrastructure in One Place


Enterprise Servers

Operating Systems
Solaris
Cluster

Exadata
Exalogic

Virtualization

Engineered
Systems

for SPARC
Containers
Dynamic Domains,
OVM for SPARC

Infiniband & Ethernet Fabrics


143Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

Storage Systems

Key Features
DISCOVER

Inventory
Bare-metal discovery
VM auto discovery
Advanced permission model
Team sharing

Firmware
Solaris and Linux
Golden images
LDom hypervisor
Provision OS in Zones/LDoms

UPDATE

Automation

Active dependency rules


Job scheduling
Job simulation
Rollback and recovery

PROVISION

144Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

Solaris, Linux Windows


Baseline reporting
Matching mirror
Intelligent knowledge base
Patch an OS in a VM, Zone, LDom

Hardware and OS
Resource optimization
Reporting
Audit log
Historical monitoring

MONITOR/MANAGE

Advanced Virtualization Management


Central interface for VM lifecycle management
- Solaris Zones, Oracle VM for SPARC, Dynamic Domains
Monitor VM or system-level utilization
Reconfigure VMs dynamically
Create resource pools
Migrate VMs across servers

Centralized VM Lifecycle Management


Speed VM deployments.
Increase productivity.

Dynamic
Domains

Zones

HYPERVISOR

M5-32
M-Series

145Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Oracle VM
for SPARC

Confidential Oracle Internal

T-Series

SPARC & X86

Ops Center Login

146Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

T5-8 - Summary

147Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

T5-8 - Hardware

148Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

SPARC Roadmap

149Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

Oracle SPARC Processor Roadmap


M-Series
+2x Throughput
+1.5x Thread Strength

In the Lab

M-Series

In Test

M-Series

Delivered

+6x Throughput
+1.5x Thread Strength

+2x Throughput
>1x Thread Strength

+2x Throughput
+1.5x Thread Strength

T-Series

T4

+2.5x Throughput
+1.2x Thread Strength

+1x Throughput
+5x Thread Strength

Solaris 11
Solaris 10 U10

Database Query
Compression
Encryption
Cluster Interconnect
Software Quality

T-Series

In Test

2011

Oracle Application
Accelerators

2012

2013

Solaris 11 Update
Solaris 10 Update

150Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Solaris 11 Update
Solaris 10

Confidential Oracle Internal

2014

Solaris 11 Update
Solaris 10

2015

Solaris 11 Update
Solaris 10

2016

SPARC Future Directions


2x Application Performance Improvement Every 2 Years
Application Accelerators
Database query
Compression
Encryption
Cluster Interconnect
Application Data Protection

Increased Performance
Higher core frequency
Multiple pipelines per core
Increased core counts per
chip
Larger caches
More memory bandwidth

Performance Reliability Security In-memory Database Big Data


151Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

Summary
In simple terms, moving a binary from T4 to T5 gives at least double or more

performance.
2.7x Memory bandwidth, 2x I/O bandwidth of T4.
2.4x Throughput over T4, for 128 threads.
3.6 Ghz core, Inherit all the advancement of T4

(OoO core, crypto, L2 cache per core...)

New directory based protocol for scaling (2/4/8 socket)


Significant advancements in power management mean that power

consumption will scale well with load


Idle systems will consume a small fraction of peak power
Enterprise-class RAS features.
152Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

153Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal

154Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Confidential Oracle Internal