Beruflich Dokumente
Kultur Dokumente
Agenda
DAY 1 DAY 2
9AM – 9:15AM: Introduction 9AM -9:30AM: SPARC64 Processor
9:15AM - 10:30AM: Setup 9:30AM - 10AM: More XSCF
install or "Lost Password" procedure Reports
Network (external and DSCP) Log archiving, or dump to USB stick
Services Various "show*" commands
Users 10AM - 10:30AM: LAB
10:30AM - 10:45AM: Break 10:30AM - 10:45: Break
10:45AM - 12PM: LAB 10:45AM - 11:15AM: Review
12PM - 1PM: Lunch BUI connection
1PM - 2PM Review 11:15AM - 11:45AM: LAB
discuss privileges/roles 11:45AM - 1PM: Lunch
2PM - 2:15PM: LAB 1PM - 3PM: Open LAB
2:15PM - 3:30PM: Review If they have a SunMC server on the network,
Domains setup SunMC and view platform info from
3:30PM - 3:45PM: Break SunMC console
3:45PM - 4:30PM: LAB If we have time, load a domain with Solaris and
look at DSCP info from the domain side
3PM - 3:30PM: Q&A and Closing
2
Mx000 Server Hands On Training
SPARC64 VI Processor
3
Mx000 Server Hands On Training
SPARC64 VI (Olympus-C)
Two SPARC V9 cores @ 2.15 – 2.4 GHz
Exports sun4u architecture to Solaris
Two vertical threads per core
128KB I$ and 128KB D$ per core System Interconnect
5-6MB on-chip shared L2$, no external cache 5B @ 530Mhz
Extensive availability feature set
Switch strands on events: Vertical Multi Threading System
Scalable to 64 sockets C1 C2
Technology: Fujitsu 90nm
Power: 120W @ 1.1v & 2.15 - 2.4 GHz
32MB
E$ SRAM
4
Mx000 Server Hands On Training
Processor Specifications
Each Olympus-C chip contains two cores
Each core supports two CMT strands
L1 D-cache 128 Kbytes
L1 I-cache 128 Kbytes
Each core has its own L1 cache
L2 cache 5 or 6 Mbytes (10, or 12-way interleave)
Both cores share the L2 cache
M4000/M5000 Clock rate 2.15GHz
M8000/M9000 Clock rate 2.28GHz or 2.4 Ghz
5
Mx000 Server Hands On Training
Olympus Multistranding
Each Core supports two strands
Most physical resources (ALU, instruction pipeline) shared between
strands
Each strand has its own software visible registers (PC, nextPC, data
registers, etc)
OS sees each strand as a complete processor
Strand switch time is 21 clocks
Switch triggered on L2 cache miss or every 5000 clocks
Best application throughput gain seen: approximately 20% (often
times much less).
Extra strand can be disabled with psradm.
6
Mx000 Server Hands On Training
Error
PC Program Counter
GPR General Purpose Register
Instruction retry
Execution
Instruction
Instruction
Execution
Update Update
fetch
fetch
PC/GPR
etc.
PC/GPR
etc.
….
7
Mx000 Server Hands On Training
Jupiter options can be mixed with existing SPARC64 FPU FPU FPU FPU
VI (Olympus) systems C1 C2 C3 C4
Mid CY08 availability for SPARC Enterprise M4000-
M9000 systems New
8
Mx000 Server Hands On Training
Reports
9
Mx000 Server Hands On Training
Getting Reports...
snapshot - Saves log information to the specified
destination.
snapshot -d usb0 {-r}
snapshot -p <password> -t joe@jupiter.west
:/home/joe/logs/x
fmstat - Displays the FMDE status
XSCF> fmstat
module ev_recv ev_acpt wait svc_t %w %b open solve memsz bufsz
case-close 30 0 0.0 4.3 0 0 0 0 1.4K 0
event-transport 0 0 0.0 0.4 0 0 0 0 5.8K 0
faultevent-post 28 0 0.0 331.6 0 0 0 0 0 0
flush 56 0 0.0 6.9 0 0 0 0 0 0
fmd-self-diagnosis 0 0 0.0 0.5 0 0 0 0 0 0
iox_agent 0 0 0.0 0.5 0 0 0 0 0 0
reagent 0 0 0.0 0.4 0 0 0 0 0 0
sde 28 28 0.0 154.1 0 0 0 28 156K 0
snmp-trapgen 28 0 0.0 3.1 0 0 0 0 0 0
sysevent-transport 0 0 0.0 211.0 0 0 0 0 0 0
syslog-msgs 28 0 0.0 38.9 0 0 0 0 97b 0
10
Mx000 Server Hands On Training
Logs
11
Mx000 Server Hands On Training
Getting Logs
Appendix B of XSCF User's Guide
XSCF error log
Power log
XSCF event log
Monitoring message log
Temperature and humidity history log
Console log
Panic log
12
Mx000 Server Hands On Training
Log Archiving
A feature of XSCF
Persistent storage space on an Service Processor is limited
Used to set up a remote host as a backing store for SP logs
User chooses an “archive host”, Solaris or Linux based
When user enables Log Archiving, it copies existing log data to
the archive host
Log Archiving continues to archive new log data
All connections established through the log archiving are encrypted
Increases space available for logs
To prevent audit trail overflow, for security reasons
To prevent loss of logs and dump files, for serviceability
13
Mx000 Server Hands On Training
Archiving Requirements
Archive SP logs:
Audit Trail
FM Fault Log & FM Error Log
XSCF Error Log & Event Log
Power Log & Thermal History
Domain Console Logs
SCF Trace Regions
Hardware State Dumps
Core Dumps from XSCF Processes
And more...
14
Mx000 Server Hands On Training
Log Archiving
Archive Host
Archiving
setarchiving -t log@10.1.140.7:/Log-Archiving -r
setarchiving -l Unlimited,10000
But the files are in binary.....
16
Mx000 Server Hands On Training
Archiving
Snapshot Analysis Tools is a set of tools available to
analyze the information available in the data collected by
snapshot.
With revision 1.4 and later , the toolset offers the ability to analyze
the logs obtained by Log Archiving.
unpack_snapshot
pack_logarc
showlogs : an off-platform XSCF logs viewer (Snapshot and Log
Archives)
dbdump : get information from the CMEM data available in the
snapshot
show_scf_trace : analyse the XSCF trace buffer information
Get the toolset from
http://panacea/twiki/bin/view/Main/SnapshotAnalysisTools#Extra_tool_Domain_Confi
guration 17
Mx000 Server Hands On Training
Data Collection
XSCF includes a built-in, Sun designed and implemented
data collection mechanism
The command is called snapshot and can be used from
the CLI or BUI
Snapshot is intended to collect and then send data which
service and engineering consider to be essential to
diagnosing problems with the XSCF
Snapshot can transfer its zip archive to a specified
destination using SSH, or to a flash drive connected to
the XSCFU via the USB port
18
Mx000 Server Hands On Training
FMA
OPL makes extensive use of FMA to diagnose errors not
only in Solaris, but also on the XSCF
Core FMA functionality has been ported to Linux and exists
as FMSP to diagnose hardware errors
FMA is critical to diagnosing hardware errors on the OPL
platform
We don't have diagnostic tools like redx
FMA reports do provide diagnostic codes for additional
information, but they will need to be decoded by Fujitsu
19
Mx000 Server Hands On Training
Fault Finding
'fmdump' available on the SCF and Solaris to list suspects.
'showstatus' will tell show fault state as described in
previous slide.
20
Mx000 Server Hands On Training
Fault Finding
Understand fault terminology. Very likely to cause
confusion if you use terminology applicable to legacy
systems.
Faulted – A FRU has a fault and is disabled (the FRU has been
taken out of service)
Degraded – A FRU has a fault, but it is still enabled (being used
by the system), usually with either reduced functionality,
reduced reliability, or reduced performance
Deconfigured – A FRU is disabled or has been physically
removed, typically due to the fact that a FRU on the platform
(either this FRU or another FRU) is faulted, degraded, or
missing. A deconfigured FRU IS NOT faulty
Maintenance – Under maintenance
Do not use the term “blacklist”. No such concept on OPL.
21
Mx000 Server Hands On Training
Fault Finding
SCF has no visibility to PCI errors or faults
All I/O faults are diagnosed by Solaris
When a PCI fault arrives from Solaris, reagent (Reissue
Agent) issues a new fault event on the SCF identical to
the Solaris fault
'fmdump' on the SCF will show the same PCI fault information as
'fmdump' on Solaris
Reagent prevents fault from being included in resource
cache on the SCF
'fmadm faulty' on Solaris will show the faulty component, while 'fmadm
faulty' on the SCF will show nothing
Only need to do 'fmadm repair' on Solaris (which is required since PCI
components do not have serial ids)
22
Mx000 Server Hands On Training
Fault Finding
Do not use 'fmadm repair' (available only in escalation
mode) on the SCF. As it can leave the system in an
inconsistent state.
The field should use 'clearfault' which is available in service
mode to clear fault status.
'clearstatus' (escalation mode) maybe needed under more
complex circumstances. 'clearstatus' should be used
only under direction from TSC.
Report usage of rebootxscf/clearfru/clearstatus to TSC.
Every use of these CLIs is a potential bug.
'fmadm faulty' command (escalation mode) is only used to
view the resource cache, the field should not use this
command to check which FRUs are faulty
23
Mx000 Server Hands On Training
Fault Finding
POST output will be interleaved (like in current Sun
POST) when tests are run in parallel
Messages indicates component under test, test name,
time stamp, and start/end times of POST
Failure will report physical location
POST error messages contain physical information, but
normal POST output messages contain logical
information
This will likely be confusing to the field and something that
we will have to live with
Important to collect “historical” POST logs and setup
archive host
24
Mx000 Server Hands On Training
25
Mx000 Server Hands On Training
LAB
26
Mx000 Server Hands On Training
Lab system
XSCF Access to the M9000 in GMP03
XSCF login: platadm, standard lab password
27
Mx000 Server Hands On Training
Lab example
XSCF> showarchiving XSCF> showarchiving
*** Archiving Configuration *** *** Archiving Configuration ***
Archiving state ---------- Disabled
Archiving state ---------- Disabled
Archive host ------------- 10.130.0.21
Archive host ------------- Not configured Archive directory -------- /Archiving_m8000
Archive directory -------- Not configured User name for ssh login -- root
User name for ssh login -- Not configured Archive host fingerprint - Server authentication
disabled
Archive host fingerprint - Server authentication disabled
*** Connection to Archive Host ***
*** Connection to Archive Host *** Latest communication ----- None
Latest communication ----- None Connection status -------- None
Connection status -------- None
AUDIT LOGS OTHER LOGS
---------- ----------
AUDIT LOGS OTHER LOGS Archive space limit Unlimited 5000 MB
---------- ---------- Archive space used Not monitored Not monitored
Total archiving failures 0 0
Archive space limit Unlimited 5000 MB
Unresolved failures 0 0
Archive space used Not monitored Not monitored XSCF> setarchiving enable
Total archiving failures 0 0 Testing the archiving configuration...
Unresolved failures 0 0 Logs will be archived to 10.130.0.21.
XSCF> setarchiving XSCF>
Usage: setarchiving enable
or: setarchiving disable
or: setarchiving [-k host_key] [-l audit_limit,non_audit_limit]
[-p password | -r] [-t user@host:directory] [-v] [-y|-n]
or: setarchiving [-h]
XSCF> setarchiving -p newroot -t root@10.130.0.21:/Archiving_m8000
28
Mx000 Server Hands On Training
BREAK
(15 min)
29
Mx000 Server Hands On Training
BUI at RR
30
Mx000 Server Hands On Training
Enable BUI
sethttps -c selfsign US Oregon Hillsboro SystemsGroup
TechMktg ff2 email_address@emailserver.com
sethttps -c enable
31
Mx000 Server Hands On Training
BUI Support at RR
32
Mx000 Server Hands On Training
BUI Support at RR
33
Mx000 Server Hands On Training
BUI Support at RR
34
Mx000 Server Hands On Training
BUI Support at RR
35
Mx000 Server Hands On Training
BUI in XCP1050
36
Mx000 Server Hands On Training
37
Mx000 Server Hands On Training
38
Mx000 Server Hands On Training
39
Mx000 Server Hands On Training
40
Mx000 Server Hands On Training
41
Mx000 Server Hands On Training
42
Mx000 Server Hands On Training
43
Mx000 Server Hands On Training
44
Mx000 Server Hands On Training
45
Mx000 Server Hands On Training
46
Mx000 Server Hands On Training
47
Mx000 Server Hands On Training
48
Mx000 Server Hands On Training
49
Mx000 Server Hands On Training
50
Mx000 Server Hands On Training
51
Mx000 Server Hands On Training
52
Mx000 Server Hands On Training
53
Mx000 Server Hands On Training
54
Mx000 Server Hands On Training
55
Mx000 Server Hands On Training
56
Mx000 Server Hands On Training
57
Mx000 Server Hands On Training
58
Mx000 Server Hands On Training
59
Mx000 Server Hands On Training
LAB
60
Mx000 Server Hands On Training
Lab system
XSCF Access to the M9000 in GMP03
XSCF login: platadm, standard lab password
61
Mx000 Server Hands On Training
Lunch
(1 hour)
62
Mx000 Server Hands On Training
63
Mx000 Server Hands On Training
SunMC
setsunmc -s tm160-139.sfbay -z techmktg -p 1161
setsunmc enable
Did you remember to enable SNMP ? (on the XSCF)
The SunMC agent is part of the XSCF. If it's a Fujitsu box,
(branded as Fujitsu) you need to go into a escalation
mode to be able to configure sunmc agent on Fujitsu
XSCF....
To install SunMC server, pay attention to user creation,
group setup for the user, and don't forget to add the user
into /var/opt/SUNWsymon/cfgcfuser
64
Mx000 Server Hands On Training
65
Mx000 Server Hands On Training
66
Mx000 Server Hands On Training
67
Mx000 Server Hands On Training
68
Mx000 Server Hands On Training
69
Mx000 Server Hands On Training
70
Mx000 Server Hands On Training
71
Mx000 Server Hands On Training
Open Lab /
Question Time
72
Mx000 Server Hands On Training
Materials references
http://docs.sun.com
http://uask4it.sfbay/~grc/opl Contains ALL the materials on
OPL, documentations, photos, SW restrictions doc,
videos (mp4, divX,...), hands-on materials,...
73
Mx000 Server Hands On Training
Quiz ! !
True or False: The BUI or CLI can be used to install new
XCP software.
True or False: The cfgdevice can also be used to assign the
USB port to a domain.
How many M4000 systems can be placed in the 12U space
in the M8000?
If a single CMU/IOU can support up to 4 domains, how
many disk drives need to be installed in the IOU?
When configuring the XSCF, how many network IP
addresses need to be specified in the M4000 and
M9000?
74
Mx000 Server Hands On Training
Quiz ! !
True or False: If a CMU with just 2 CPUs can only be
placed in uni mode, then an M5000 with only one CPUM
can only support 1 domain.
True of False: The command to add domain 2 power on/off
privilege is “setprivileges user domainadm@2”
True of False: The maximum number of LSBs per domain is
24.
Which command gives details about which XSB are
assigned to which domains, showdcl or showboards?
What commands are needed to assign a newly inserted
CMU to a domain?
75
Mx000 Server Hands On Training
Quiz ! !
True or False: The BUI or CLI can be used to install new XCP
software. (False)
True or False: The cfgdevice can also be used to assign the USB
port to a domain. (False)
How many M4000 systems can be placed in the 12U space in the
M8000? (None, not supported)
If a single CMU/IOU can support up to 4 domains, how many disk
drives need to be installed in the IOU for boot disks? (4, but they
can only be used by 2 of the 4 domains)
When configuring the XSCF, how many network IP addresses need
to be specified in the M4000 and M9000? (7(2+5) and
33(4+2+2+25))
76
Mx000 Server Hands On Training
Quiz ! !
True or False: If a CMU with just 2 CPUs can only be placed in uni
mode, then an M5000 with only one CPUM can only support 1
domain. (False)
True of False: The command to add domain 2 power on/off privilege
is “setprivileges user domainadm@2”. (False: domainmgr and
must list ALL privs)
True of False: The maximum number of LSBs per domain is 24.
(False: 16)
Which command gives details about which XSB is assigned to which
domains: showdcl or showboards? (showboards)
What commands are needed to assign a newly inserted CMU to a
running domain? (setupfru, setdcl, addboard)
77
David Campbell
d.campbell@sun.com