Sie sind auf Seite 1von 78

SPARC Enterprise

Server Training - Day 2

Presentation delivered by David Campbell


d.campbell@sun.com
OPL Technical Community, Systems TSC EMEA

Based on presentation by Gary Combs, SPARC Specialist


Systems Group Technical Marketing
Mx000 Server Hands On Training

Agenda
DAY 1 DAY 2
9AM – 9:15AM: Introduction 9AM -9:30AM: SPARC64 Processor
9:15AM - 10:30AM: Setup 9:30AM - 10AM: More XSCF
install or "Lost Password" procedure Reports
Network (external and DSCP) Log archiving, or dump to USB stick
Services Various "show*" commands
Users 10AM - 10:30AM: LAB
10:30AM - 10:45AM: Break 10:30AM - 10:45: Break
10:45AM - 12PM: LAB 10:45AM - 11:15AM: Review
12PM - 1PM: Lunch BUI connection
1PM - 2PM Review 11:15AM - 11:45AM: LAB
discuss privileges/roles 11:45AM - 1PM: Lunch
2PM - 2:15PM: LAB 1PM - 3PM: Open LAB
2:15PM - 3:30PM: Review If they have a SunMC server on the network,
Domains setup SunMC and view platform info from
3:30PM - 3:45PM: Break SunMC console
3:45PM - 4:30PM: LAB If we have time, load a domain with Solaris and
look at DSCP info from the domain side
3PM - 3:30PM: Q&A and Closing
2
Mx000 Server Hands On Training

SPARC64 VI Processor

3
Mx000 Server Hands On Training

SPARC64 VI (Olympus-C)
Two SPARC V9 cores @ 2.15 – 2.4 GHz
Exports sun4u architecture to Solaris
Two vertical threads per core
128KB I$ and 128KB D$ per core System Interconnect
5-6MB on-chip shared L2$, no external cache 5B @ 530Mhz
Extensive availability feature set
Switch strands on events: Vertical Multi Threading System

Implements combination of CMP and VMT. It has two physical Interface


Arbiter / Switch
cores where each core has two strands with VMT structure. Four
threads are able to run in parallel. The two strands that belong to the 5-6MB L2$
same physical core, share most of the physical resources, while the two 128KB D$ 128KB D$
physical cores do not share physical resources except L2$ and system 128KB I$
interface. 128KB I$

New system chipset/interconnect (Jupiter bus)


FPU FPU

Scalable to 64 sockets C1 C2
Technology: Fujitsu 90nm
Power: 120W @ 1.1v & 2.15 - 2.4 GHz
32MB
E$ SRAM

4
Mx000 Server Hands On Training

Processor Specifications
Each Olympus-C chip contains two cores
Each core supports two CMT strands
L1 D-cache 128 Kbytes
L1 I-cache 128 Kbytes
Each core has its own L1 cache
L2 cache 5 or 6 Mbytes (10, or 12-way interleave)
Both cores share the L2 cache
M4000/M5000 Clock rate 2.15GHz
M8000/M9000 Clock rate 2.28GHz or 2.4 Ghz

5
Mx000 Server Hands On Training

Olympus Multistranding
Each Core supports two strands
Most physical resources (ALU, instruction pipeline) shared between
strands
Each strand has its own software visible registers (PC, nextPC, data
registers, etc)
OS sees each strand as a complete processor
Strand switch time is 21 clocks
Switch triggered on L2 cache miss or every 5000 clocks
Best application throughput gain seen: approximately 20% (often
times much less).
Extra strand can be disabled with psradm.

6
Mx000 Server Hands On Training

SPARC64TM VI RAS Feature - Instruction Retry


Automatic hardware Instruction Retry process, if error occurs
If instruction retry executes successfully
Processor resumes normal processing
If instruction retries fail and reach threshold
Processor logs the source of error, and reports status to OS

Increase in Processor Availability


Execution
Instruction
fetch

Error

PC Program Counter
GPR General Purpose Register
Instruction retry
Execution

Instruction
Instruction

Execution
Update Update
fetch
fetch

PC/GPR
etc.
PC/GPR
etc.
….

7
Mx000 Server Hands On Training

SPARC64 VII (Jupiter)


Four SPARC V9 cores @ 2.4-2.6GHz
Exports sun4u architecture to Solaris
Improved threading model, 2 SMT threads per core
64KB I$ and 64KB D$ per core System Interconnect
Large on-chip shared 5-6MB L2$, no external cache
Technology: Fujitsu 65nm 5B @ 530Mhz

Power: ~120W @1.1v & 2.6GHz System


Interface
Same system chipset/interconnect
Arbiter / Switch
SPARC v9 compliant design; Binary compatible to 5-6MB L2$
S10 applications 64KB D$ 64KB D$ 64KB D$ 64KB D$

S10U5 minimum OS level required 64KB I$ 64KB I$ 64KB I$ 64KB I$

Jupiter options can be mixed with existing SPARC64 FPU FPU FPU FPU

VI (Olympus) systems C1 C2 C3 C4
Mid CY08 availability for SPARC Enterprise M4000-
M9000 systems New

8
Mx000 Server Hands On Training

Reports

9
Mx000 Server Hands On Training

Getting Reports...
snapshot - Saves log information to the specified
destination.
snapshot -d usb0 {-r}
snapshot -p <password> -t joe@jupiter.west
:/home/joe/logs/x
fmstat - Displays the FMDE status
XSCF> fmstat
module ev_recv ev_acpt wait svc_t %w %b open solve memsz bufsz
case-close 30 0 0.0 4.3 0 0 0 0 1.4K 0
event-transport 0 0 0.0 0.4 0 0 0 0 5.8K 0
faultevent-post 28 0 0.0 331.6 0 0 0 0 0 0
flush 56 0 0.0 6.9 0 0 0 0 0 0
fmd-self-diagnosis 0 0 0.0 0.5 0 0 0 0 0 0
iox_agent 0 0 0.0 0.5 0 0 0 0 0 0
reagent 0 0 0.0 0.4 0 0 0 0 0 0
sde 28 28 0.0 154.1 0 0 0 28 156K 0
snmp-trapgen 28 0 0.0 3.1 0 0 0 0 0 0
sysevent-transport 0 0 0.0 211.0 0 0 0 0 0 0
syslog-msgs 28 0 0.0 38.9 0 0 0 0 97b 0

10
Mx000 Server Hands On Training

Logs

11
Mx000 Server Hands On Training

Getting Logs
Appendix B of XSCF User's Guide
XSCF error log
Power log
XSCF event log
Monitoring message log
Temperature and humidity history log
Console log
Panic log

12
Mx000 Server Hands On Training

Log Archiving
A feature of XSCF
Persistent storage space on an Service Processor is limited
Used to set up a remote host as a backing store for SP logs
User chooses an “archive host”, Solaris or Linux based
When user enables Log Archiving, it copies existing log data to
the archive host
Log Archiving continues to archive new log data
All connections established through the log archiving are encrypted
Increases space available for logs
To prevent audit trail overflow, for security reasons
To prevent loss of logs and dump files, for serviceability

13
Mx000 Server Hands On Training

Archiving Requirements
Archive SP logs:
Audit Trail
FM Fault Log & FM Error Log
XSCF Error Log & Event Log
Power Log & Thermal History
Domain Console Logs
SCF Trace Regions
Hardware State Dumps
Core Dumps from XSCF Processes
And more...

14
Mx000 Server Hands On Training

Log Archiving
Archive Host

Before enabling log archiving, an archive directory


User
must be created on the archive host. There should Interface on
be a separate archive directory for each FF/DC Archive 1
system that uses the archive host. The directory Host
permissions should be set so that only authorized
users can access its contents
Configure the log archiving feature
The archive daemon stores data in the Service 4 ssh, scp
Processor data repository. On dual-Service
Processor systems, the data repository replicates User System
the data to the standby Service Processor Interface on Archive Service
FF/DC 2 Daemon Processor
When log archiving is enabled, archd transfers System
log data to the archive host with SCP. It uses SSH to
manage 6 3
the logs which it previously copied
5
As new data accumulates in logs, archd polls log files
Event
at fixed intervals to determine when new data needs Logging
to be archived
Audit
Logging
If errors occur, archd logs them in the Error Log.
When auditable events occur, archd logs them in Logs Data
Repository
the Audit Log
15
Mx000 Server Hands On Training

Archiving
setarchiving -t log@10.1.140.7:/Log-Archiving -r
setarchiving -l Unlimited,10000
But the files are in binary.....

16
Mx000 Server Hands On Training

Archiving
Snapshot Analysis Tools is a set of tools available to
analyze the information available in the data collected by
snapshot.
With revision 1.4 and later , the toolset offers the ability to analyze
the logs obtained by Log Archiving.
unpack_snapshot
pack_logarc
showlogs : an off-platform XSCF logs viewer (Snapshot and Log
Archives)
dbdump : get information from the CMEM data available in the
snapshot
show_scf_trace : analyse the XSCF trace buffer information
Get the toolset from
http://panacea/twiki/bin/view/Main/SnapshotAnalysisTools#Extra_tool_Domain_Confi
guration 17
Mx000 Server Hands On Training

Data Collection
XSCF includes a built-in, Sun designed and implemented
data collection mechanism
The command is called snapshot and can be used from
the CLI or BUI
Snapshot is intended to collect and then send data which
service and engineering consider to be essential to
diagnosing problems with the XSCF
Snapshot can transfer its zip archive to a specified
destination using SSH, or to a flash drive connected to
the XSCFU via the USB port

18
Mx000 Server Hands On Training

FMA
OPL makes extensive use of FMA to diagnose errors not
only in Solaris, but also on the XSCF
Core FMA functionality has been ported to Linux and exists
as FMSP to diagnose hardware errors
FMA is critical to diagnosing hardware errors on the OPL
platform
We don't have diagnostic tools like redx
FMA reports do provide diagnostic codes for additional
information, but they will need to be decoded by Fujitsu

19
Mx000 Server Hands On Training

Fault Finding
'fmdump' available on the SCF and Solaris to list suspects.
'showstatus' will tell show fault state as described in
previous slide.

20
Mx000 Server Hands On Training

Fault Finding
Understand fault terminology. Very likely to cause
confusion if you use terminology applicable to legacy
systems.
Faulted – A FRU has a fault and is disabled (the FRU has been
taken out of service)
Degraded – A FRU has a fault, but it is still enabled (being used
by the system), usually with either reduced functionality,
reduced reliability, or reduced performance
Deconfigured – A FRU is disabled or has been physically
removed, typically due to the fact that a FRU on the platform
(either this FRU or another FRU) is faulted, degraded, or
missing. A deconfigured FRU IS NOT faulty
Maintenance – Under maintenance
Do not use the term “blacklist”. No such concept on OPL.
21
Mx000 Server Hands On Training

Fault Finding
SCF has no visibility to PCI errors or faults
All I/O faults are diagnosed by Solaris
When a PCI fault arrives from Solaris, reagent (Reissue
Agent) issues a new fault event on the SCF identical to
the Solaris fault
'fmdump' on the SCF will show the same PCI fault information as
'fmdump' on Solaris
Reagent prevents fault from being included in resource
cache on the SCF
'fmadm faulty' on Solaris will show the faulty component, while 'fmadm
faulty' on the SCF will show nothing
Only need to do 'fmadm repair' on Solaris (which is required since PCI
components do not have serial ids)

22
Mx000 Server Hands On Training

Fault Finding
Do not use 'fmadm repair' (available only in escalation
mode) on the SCF. As it can leave the system in an
inconsistent state.
The field should use 'clearfault' which is available in service
mode to clear fault status.
'clearstatus' (escalation mode) maybe needed under more
complex circumstances. 'clearstatus' should be used
only under direction from TSC.
Report usage of rebootxscf/clearfru/clearstatus to TSC.
Every use of these CLIs is a potential bug.
'fmadm faulty' command (escalation mode) is only used to
view the resource cache, the field should not use this
command to check which FRUs are faulty
23
Mx000 Server Hands On Training

Fault Finding
POST output will be interleaved (like in current Sun
POST) when tests are run in parallel
Messages indicates component under test, test name,
time stamp, and start/end times of POST
Failure will report physical location
POST error messages contain physical information, but
normal POST output messages contain logical
information
This will likely be confusing to the field and something that
we will have to live with
Important to collect “historical” POST logs and setup
archive host

24
Mx000 Server Hands On Training

Other things to know

Memory Installation Rules


Install 16 DIMMs in a group
All DIMMs in a group must be same type (size and rank)
Install Group A first
Size of Group B DIMMs must be less than or equal to
Group A DIMMs
Rank of Group B DIMMs may be different than Group A
DIMMs.

25
Mx000 Server Hands On Training

LAB

Setup log archiving on the M4000, snapshot to a usb device


Use the GMP03 lab platform to investigate using the showlog and snapshot commands

26
Mx000 Server Hands On Training

Lab system
XSCF Access to the M9000 in GMP03
XSCF login: platadm, standard lab password

v4u-m9000a-xscf0-0-gmp03 v4u-m9000a-xscf0-0-gmp03 129.156.215.100


v4u-m9000a-xscf1-0-gmp03 v4u-m9000a-xscf1-0-gmp03 129.156.215.223

Hardware – physical system boards


00-0
01-0
02-0
08-0
09-0
10-0

27
Mx000 Server Hands On Training

Lab example
XSCF> showarchiving XSCF> showarchiving
*** Archiving Configuration *** *** Archiving Configuration ***
Archiving state ---------- Disabled
Archiving state ---------- Disabled
Archive host ------------- 10.130.0.21
Archive host ------------- Not configured Archive directory -------- /Archiving_m8000
Archive directory -------- Not configured User name for ssh login -- root
User name for ssh login -- Not configured Archive host fingerprint - Server authentication
disabled
Archive host fingerprint - Server authentication disabled
*** Connection to Archive Host ***
*** Connection to Archive Host *** Latest communication ----- None
Latest communication ----- None Connection status -------- None
Connection status -------- None
AUDIT LOGS OTHER LOGS
---------- ----------
AUDIT LOGS OTHER LOGS Archive space limit Unlimited 5000 MB
---------- ---------- Archive space used Not monitored Not monitored
Total archiving failures 0 0
Archive space limit Unlimited 5000 MB
Unresolved failures 0 0
Archive space used Not monitored Not monitored XSCF> setarchiving enable
Total archiving failures 0 0 Testing the archiving configuration...
Unresolved failures 0 0 Logs will be archived to 10.130.0.21.
XSCF> setarchiving XSCF>
Usage: setarchiving enable
or: setarchiving disable
or: setarchiving [-k host_key] [-l audit_limit,non_audit_limit]
[-p password | -r] [-t user@host:directory] [-v] [-y|-n]
or: setarchiving [-h]
XSCF> setarchiving -p newroot -t root@10.130.0.21:/Archiving_m8000

28
Mx000 Server Hands On Training

BREAK
(15 min)

29
Mx000 Server Hands On Training

BUI at RR

30
Mx000 Server Hands On Training

Enable BUI
sethttps -c selfsign US Oregon Hillsboro SystemsGroup
TechMktg ff2 email_address@emailserver.com
sethttps -c enable

31
Mx000 Server Hands On Training

BUI Support at RR

32
Mx000 Server Hands On Training

BUI Support at RR

33
Mx000 Server Hands On Training

BUI Support at RR

Browse from the system


you are log on, not from XSCF

34
Mx000 Server Hands On Training

BUI Support at RR

Download a snapshot from XSCF


to the system you are log on

35
Mx000 Server Hands On Training

BUI in XCP1050

36
Mx000 Server Hands On Training

37
Mx000 Server Hands On Training

38
Mx000 Server Hands On Training

39
Mx000 Server Hands On Training

40
Mx000 Server Hands On Training

41
Mx000 Server Hands On Training

42
Mx000 Server Hands On Training

43
Mx000 Server Hands On Training

44
Mx000 Server Hands On Training

45
Mx000 Server Hands On Training

46
Mx000 Server Hands On Training

After saving you need to reboot the XSCF


for settings to be applied.
If you are login as an operator, you will not
have access to those functions (grey)

47
Mx000 Server Hands On Training

48
Mx000 Server Hands On Training

49
Mx000 Server Hands On Training

50
Mx000 Server Hands On Training

51
Mx000 Server Hands On Training

52
Mx000 Server Hands On Training

53
Mx000 Server Hands On Training

54
Mx000 Server Hands On Training

55
Mx000 Server Hands On Training

56
Mx000 Server Hands On Training

57
Mx000 Server Hands On Training

58
Mx000 Server Hands On Training

XCP1050 BUI notes


DR is not in it yet (addboard, deleteboard, moveboard not
implemented).
COD is technically supported in XCP1041, but will really
get out with XCP1050. It will be supported in XCP1050
BUI as well.

59
Mx000 Server Hands On Training

LAB

Setup up access to the BUI in the M4000


Use the GMP03 lab platform to view the BUI

60
Mx000 Server Hands On Training

Lab system
XSCF Access to the M9000 in GMP03
XSCF login: platadm, standard lab password

v4u-m9000a-xscf0-0-gmp03 v4u-m9000a-xscf0-0-gmp03 129.156.215.100


v4u-m9000a-xscf1-0-gmp03 v4u-m9000a-xscf1-0-gmp03 129.156.215.223

Hardware – physical system boards


00-0
01-0
02-0
08-0
09-0
10-0

61
Mx000 Server Hands On Training

Lunch
(1 hour)

62
Mx000 Server Hands On Training

63
Mx000 Server Hands On Training

SunMC
setsunmc -s tm160-139.sfbay -z techmktg -p 1161
setsunmc enable
Did you remember to enable SNMP ? (on the XSCF)
The SunMC agent is part of the XSCF. If it's a Fujitsu box,
(branded as Fujitsu) you need to go into a escalation
mode to be able to configure sunmc agent on Fujitsu
XSCF....
To install SunMC server, pay attention to user creation,
group setup for the user, and don't forget to add the user
into /var/opt/SUNWsymon/cfgcfuser

64
Mx000 Server Hands On Training

65
Mx000 Server Hands On Training

66
Mx000 Server Hands On Training

67
Mx000 Server Hands On Training

68
Mx000 Server Hands On Training

69
Mx000 Server Hands On Training

70
Mx000 Server Hands On Training

71
Mx000 Server Hands On Training

Open Lab /
Question Time

72
Mx000 Server Hands On Training

Materials references
http://docs.sun.com
http://uask4it.sfbay/~grc/opl Contains ALL the materials on
OPL, documentations, photos, SW restrictions doc,
videos (mp4, divX,...), hands-on materials,...

73
Mx000 Server Hands On Training

Quiz ! !
True or False: The BUI or CLI can be used to install new
XCP software.
True or False: The cfgdevice can also be used to assign the
USB port to a domain.
How many M4000 systems can be placed in the 12U space
in the M8000?
If a single CMU/IOU can support up to 4 domains, how
many disk drives need to be installed in the IOU?
When configuring the XSCF, how many network IP
addresses need to be specified in the M4000 and
M9000?

74
Mx000 Server Hands On Training

Quiz ! !
True or False: If a CMU with just 2 CPUs can only be
placed in uni mode, then an M5000 with only one CPUM
can only support 1 domain.
True of False: The command to add domain 2 power on/off
privilege is “setprivileges user domainadm@2”
True of False: The maximum number of LSBs per domain is
24.
Which command gives details about which XSB are
assigned to which domains, showdcl or showboards?
What commands are needed to assign a newly inserted
CMU to a domain?

75
Mx000 Server Hands On Training

Quiz ! !
True or False: The BUI or CLI can be used to install new XCP
software. (False)
True or False: The cfgdevice can also be used to assign the USB
port to a domain. (False)
How many M4000 systems can be placed in the 12U space in the
M8000? (None, not supported)
If a single CMU/IOU can support up to 4 domains, how many disk
drives need to be installed in the IOU for boot disks? (4, but they
can only be used by 2 of the 4 domains)
When configuring the XSCF, how many network IP addresses need
to be specified in the M4000 and M9000? (7(2+5) and
33(4+2+2+25))

76
Mx000 Server Hands On Training

Quiz ! !
True or False: If a CMU with just 2 CPUs can only be placed in uni
mode, then an M5000 with only one CPUM can only support 1
domain. (False)
True of False: The command to add domain 2 power on/off privilege
is “setprivileges user domainadm@2”. (False: domainmgr and
must list ALL privs)
True of False: The maximum number of LSBs per domain is 24.
(False: 16)
Which command gives details about which XSB is assigned to which
domains: showdcl or showboards? (showboards)
What commands are needed to assign a newly inserted CMU to a
running domain? (setupfru, setdcl, addboard)

77
David Campbell
d.campbell@sun.com

Das könnte Ihnen auch gefallen