Beruflich Dokumente
Kultur Dokumente
Acropolis 4.7
NX-1000/NX-3050/NX-3060/NX-3060-G4/NX-3060-G5/
NX-6000/NX-8035-G4/NX-8035-G5/NX-9000/SX-1065-G5 Series
27-Apr-2017
Notice
Copyright
Copyright 2017 Nutanix, Inc.
Nutanix, Inc.
1740 Technology Drive, Suite 150
San Jose, CA 95110
All rights reserved. This product is protected by U.S. and international copyright and intellectual property
laws. Nutanix is a trademark of Nutanix, Inc. in the United States and/or other jurisdictions. All other marks
and names mentioned herein may be trademarks of their respective companies.
License
The provision of this software to you does not grant any licenses or other rights under any Microsoft
patents with respect to anything other than the file server implementation portion of the binaries for this
software, including no licenses or any other rights in any hardware or any devices or software that are used
to communicate with or in connection with this software.
Conventions
Convention Description
user@host$ command The commands are executed as a non-privileged user (such as nutanix)
in the system shell.
root@host# command The commands are executed as the root user in the vSphere or Acropolis
host shell.
> command The commands are executed in the Hyper-V host shell.
Version
Last modified: April 27, 2017 (2017-04-27 2:23:37 GMT-7)
Overview
This document describes how to replace hardware components in a Nutanix block.
Tools and supplies required:
• Antistatic wrist strap
• Phillips screwdriver (#2)
Warning:
• All servicing must be done by a qualified service technician. During the procedure, wear
a grounding wrist strap to avoid ESD damage to the component or system. Handle all
components with care: place them on soft, static-free surfaces.
• Coordinate service with the customer for any operation that involves the hypervisor host (ESXi,
Hyper-V or AHV), virtual machines, or Nutanix software.
• If you have locked down the cluster or hypervisor, you must enable ssh access again. See the
Cluster Access Control topic in the Web Console Guide.
Return failed components to Nutanix. Follow the instructions provided on the box for the replacement
parts. If the instructions are missing or you have other questions, call the Nutanix returns department at
1-866-884-7958 in the US, or +1-508-623-1040 international.
Note: Nutanix does not provide a warranty or support services with respect to non-Nutanix
components, and any problems arising from your use of such non-Nutanix components are
expressly excluded from the Nutanix warranty and support terms. If you replace or modify Nutanix
components or install non-Nutanix components you do so at your risk.
While a Nutanix node may be able to self-correct for certain memory errors, failed memory can lead to
system degradation and should be addressed as soon as possible.
Indications
• For all systems with a Haswell CPU, these tools will display a DIMM error message: POST, the IPMI
web interface, IPMI View, SMCIPMItool, and open source ipmitool 1.8.14.
• For systems using BMC firmware 1.87 and BIOS 1.0b, or BMC 3.24 and BIOS 3.0.5, the IPMI web
interface for a node shows a message similar to Uncorrectable ECC @ DIMM1A(CPU1) - Asserted.
• For systems using BMC firmware 2.33 and BIOS 3.0.2, ipmitool or ipmiutil displays a DIMM error
message.
Match the CPU and DIMM ID reported by the IPMI event log with the DIMM diagram. See Identifying a
Failed Memory Module.
Memory performance is most efficient with a balanced configuration, where every memory channel
contains the same number of DIMMs. Unbalanced configurations are supported, but be aware that these
configurations result in lower performance.
Three DIMMs per channel (3DPC) memory channels have one blue slot and two black slots each, as
shown in the figure below:
NX-6155-G5 NX-8150-G4/-G5
All occupied slots in a memory channel must contain only DIMMs from the same manufacturer. This means
that if you are replacing a DIMM in a channel, or upgrading your memory by adding new DIMMs to a
channel, you must do one of the following:
• Make sure the DIMMs you are replacing or adding are of the same type, capacity, and manufacturer as
the other DIMMs in the channel, or
2 12 (Unbalanced) A1, B1, C1, D1, E1, F1, G1, H1 (blue slots)
A2, B2, E2, F2 (black slots)
2 20 (Unbalanced) A1, B1, C1, D1, E1, F1, G1, H1 (blue slots)
A2, B2, C2, D2, E2, F2, G2, H2 (black slots)
A3, B3, E3, F3 (black slots)
DIMM Restrictions
Replace cvm_ip_addr with the IP address of the Controller VM with the failed DIMM.
Tip: Save the output of this script for reference while you are performing the replacement
procedure. This parameter list is referred to in the procedure as the "worksheet".
1. Print and save a list of DIMM slots and the serial numbers of the DIMMs in those slots by following
Checking Memory Locations on page 11.
4. Turn on the chassis identifier lights and disconnect the cables by following Disconnecting the Cables on
page 18.
5. Remove the node from the block by following Physically Removing a Node from a Block on page 22.
6. Replace the failed memory module by following Replacing a Memory Module on page 23.
7. Place the node back in the block by following Physically Inserting a Node in a Block on page 26.
9. Verify that the memory failure is resolved by following Verifying Memory Replacement on page 32.
Memory Module
+--------------------------------------------------------------+
| Location | P1-DIMMA1 |
| Bank connection | P0_Node0_Channel0_Dimm0 |
| Capable speed | 2400 MHz |
| Current speed | 2133 MHz |
| Installed size | 16384 MB |
| Manufacturer | Samsung |
| Product part number | M393A2K40BB1-CRC |
| Serial number | 745FDB84 |
| Type | DDR4 |
+--------------------------------------------------------------+
| Location | P1-DIMMA2 |
| Bank connection | P0_Node0_Channel0_Dimm1 |
| Capable speed | 2400 MHz |
| Current speed | 2133 MHz |
| Installed size | 16384 MB |
| Manufacturer | Samsung |
| Product part number | M393A2K40BB1-CRC |
| Serial number | 745FDBBD |
| Type | DDR4 |
+--------------------------------------------------------------+
| Location | P1-DIMMA3 |
| Bank connection | P0_Node0_Channel0_Dimm2 |
| Status | No DIMM |
+--------------------------------------------------------------+
2. Save the output so you can compare it with output from the same command after you have replaced a
DIMM.
Identifying a Failed Memory Module (Ivy Bridge and Sandy Bridge CPU Platforms)
Who is responsible: This procedure should be performed by a service technician under contract with
Nutanix.
All DIMM slots in a node must be populated with DIMMs of the same type and capacity. For systems with
Ivy Bridge and Sandy Bridge CPUs, you must know the firmware version to identify a failed DIMM. The
firmware version determines which tool will correctly identify the failed DIMM, either the IPMI web interface,
or the IPMI sel list command from the host.
1. Determine the firmware version from the main IPMI web interface window.
→ For systems using BMC firmware 2.33, you must use ipmitool or ipmiutil from the host to accurately
identify the failed DIMM.
vSphere example
root@esx# /ipmitool sel list
AHV example
root@ahv# ipmitool sel list
Hyper-V example
> ipmiutil sel list -e
3. Match the CPU and DIMM ID failure information with the DIMM diagram.
Identifying a Failed Memory Module (G4 Intel Haswell and G5 Intel Broadwell CPU
Platforms)
Who is responsible: This procedure should be performed by a service technician under contract with
Nutanix.
All platforms with Haswell CPUs have the suffix -G4 in the product name: for example, NX-1065-G4 or
NX-3175-G4.
If the system is running, errors are reported. For all systems with a Haswell CPU and any BIOS and BMC
combination, these tools will display a DIMM error message:
• IPMI web interface
• IPMI View
• SMCIPMItool
• Open source ipmitool 1.8.14
Estimated time to complete: 5 minutes
Read the memory failure message in the event log and make a note of the memory location shown
in the event.
→ Alternatively, with BIOS G4-1.1 a failed DIMM will prevent the node from booting. If the node fails to
boot, the failed DIMM ID is displayed in POST.
2. Match the CPU and DIMM ID failure information with the DIMM diagram.
→ If you have already powered off the node, find the part number label on the DIMM.
→ In IPMI, select System > Hardware Information> DIMM and select the failed DIMM to view the part
number.
1. Log on to vCenter (or to the ESXi host if vCenter is not available) with the vSphere client.
3. Warning: The next step affects the operation of a Nutanix cluster. Coordinate down time with
the customer before performing the step.
Log on to the Controller VM with SSH and shut down the Controller VM.
nutanix@cvm$ cvm_shutdown -P now
Note: Do not power off, reset, or shutdown the Controller VM in any way other than the
cvm_shutdown command to ensure that the cluster is aware that the Controller VM is unavailable
4. In the Confirm Maintenance Mode dialog box, uncheck Move powered off and suspended virtual
machines to other hosts in the cluster and click Yes.
The host is placed in maintenance mode, which prevents VMs from running on the host.
If you are logged on to the ESXi host rather than to vCenter, the vSphere client disconnects when the
host shuts down.
Caution: You can only shut down one node for each cluster. If the cluster would have more than
one node shut down, shut down the entire cluster.
Estimated time to complete: 10 minutes
1. Warning: The next step affects the operation of a Nutanix cluster. Coordinate down time with
the customer before performing the step.
Log on to the Controller VM with SSH and shut down the Controller VM.
nutanix@cvm$ cvm_shutdown -P now
Note:
Always use the cvm_shutdown command to power off, reset, or shutdown the Controller VM. The
cvm_shutdown command notifies the cluster that the Controller VM is unavailable.
2. Log on to another Controller VM in the cluster with SSH (cvm_ip_addr2 from the worksheet).
If successful, this command returns no output. If it fails with a message like the following, VMs are
probably still running on the host.
Ensure that all VMs are shut down or moved to another host and try again before proceeding.
nutanix@cvm$ ~/serviceability/bin/esx-shutdown -s cvm_ip_addr
Replace cvm_ip_addr with the IP address of the Controller VM on the ESXi host (cvm_ip_addr from the
worksheet).
Alternatively, you can put the ESXi host into maintenance mode and shut it down using the vSphere
client.
If the host shuts down, a message like the following is displayed.
INFO esx-shutdown:67 Please verify if ESX was successfully shut down using
ping hypervisor_ip_addr
4. Turn on the chassis identifier lights on the front and back of the node that you will remove using one of
the following options.
→ Log on to a Controller VM, and issue the following command.
nutanix@cvm$ ~/serviceability/bin/chassis-identify -s cvm_ip_addr -t 240
Replace cvm_ip_addr with the Controller VM IP address for the node that you will remove
(cvm_ip_addr from the worksheet).
→ In the IPMI console, go to Miscellaneous > UID, select TURN ON, and click SAVE.
→ Log on to the hypervisor host (hypervisor_ip_addr from the worksheet) with SSH and issue the
following command.
root@esx# /ipmitool chassis identify 240
Caution: You can only shut down one node for each cluster. If the cluster would have more than
one node shut down, shut down the entire cluster.
Estimated time to complete: 15 minutes
1. Warning: The next step affects the operation of a Nutanix cluster. Coordinate down time with
the customer before performing the step.
2. Turn on the chassis identifier lights on the front and back of the node that you will remove using one of
the following options.
→ Log on to a Controller VM, and issue the following command.
nutanix@cvm$ ~/serviceability/bin/chassis-identify -s cvm_ip_addr -t 240
Replace cvm_ip_addr with the Controller VM IP address for the node that you will remove
(cvm_ip_addr from the worksheet).
→ In the IPMI console, go to Miscellaneous > UID, select TURN ON, and click SAVE.
→ Log on to the hypervisor host (hypervisor_ip_addr from the worksheet) with SSH and issue the
following command.
root@ahv# /usr/bin/ipmitool chassis identify 240
Caution: You can only shut down one node for each cluster. If the cluster would have more than
one node shut down, shut down the entire cluster.
Estimated time to complete: 15 minutes
1. Warning: The next step affects the operation of a Nutanix cluster. Coordinate down time with
the customer before performing the step.
Log on to the Controller VM with SSH and shut down the Controller VM.
nutanix@cvm$ cvm_shutdown -P now
Note:
Always use the cvm_shutdown command to power off, reset, or shutdown the Controller VM. The
cvm_shutdown command notifies the cluster that the Controller VM is unavailable.
2. Turn on the chassis identifier lights on the front and back of the node that you will remove using one of
the following options.
→ In the IPMI console, go to Miscellaneous > UID, select TURN ON, and click SAVE.
→ Log on to the hypervisor host (hypervisor_ip_addr from the worksheet) using RDP or the IPMI
console and issue the following command.
> ipmiutil alarms -i 240
3. Log on to the Hyper-V host with Remote Desktop Connection and start PowerShell.
1. Make a note of the cable configurations. You will need to replace the cables in the same order.
2. Disconnect the power cables. If necessary, disconnect all the other cables plugged into the node.
Figure: Back panel of NX-1020 block and NX-1065S block (1 GbE NIC shown)
1. Make a note of the cable configurations. You will need to replace the cables in the same order.
3. Squeeze both latches inward simultaneously, grasp the handles, and pull the node from the Nutanix
block.
All DIMM slots must be populated with DIMMs of the same type and capacity. Within each memory
channel, all DIMMs must have the same part number.
Figure: NIC removal for NX-1000, NX-3050, NX-6000 and NX-9040 series
→ (NX-1065-G4/G5, NX-3060-G4/G5, NX-9060-G4) If you are replacing a DIMM that is covered by the
air shroud, remove the two screws from the air shroud and unhook the air shroud from the CPU heat
sink.
Caution: Bending the heat shroud without removing the screw might damage the air shroud
and affect cooling. Always remove the screw, and move the air shroud.
Figure: NX-6035-G4/G5 and NX-8035-G4/G5 air shroud release and DIMM removal
2. Identify the slot that contains the failed memory module by correlating the memory failure message with
the DIMM ID on the serverboard illustration.
3. Press the latches that hold the failed memory module in place.
5. Insert the replacement memory module into the same slot and push it down until the latches snap in
place.
Important: It is critical that you insert the replacement memory into the same slot. The
serverboard may not recognize the memory if you insert it into a different slot.
Take care to align the notch in the bottom of the memory module with the receptive key point on the
memory slot.
7. (NX-1000, NX-3050) If you removed the NIC, carefully replace the NIC and two screws.
Caution: Do not stack other objects on top of a chassis. The weight can deform the chassis.
Components might be damaged when nodes slide in or out of the chassis.
Note: Nodes with Intel Broadwell -G5 CPUs and Intel Haswell - G4 CPUs cannot be mixed in the
same block.
1. Push the node into the block until the node is approximately two inches from full insertion.
Caution: Do not apply too much force to insert the node. The backplane connector can be
damaged.
4. Turn on the node by pressing the power button on the front of the node (the top button).
The top power light illuminates and the fans are audibly louder for approximately 2 minutes.
1. If the node is turned off, turn it on by pressing the power button on the front. Otherwise, proceed to the
next step.
2. Log on to vCenter (or to the node if vCenter is not running) with the vSphere client.
5. Log on to another Controller VM in the cluster with SSH (cvm_ip_addr2 from the worksheet).
7. Right-click the ESXi host in the vSphere client and select Rescan for Datastores. Confirm that all
Nutanix datastores are available.
If the cluster is running properly, output similar to the following is displayed for each node in the cluster:
CVM: 10.1.64.60 Up
Zeus UP [3704, 3727, 3728, 3729, 3807, 3821]
Scavenger UP [4937, 4960, 4961, 4990]
SSLTerminator UP [5034, 5056, 5057, 5139]
Hyperint UP [5059, 5082, 5083, 5086, 5099, 5108]
Medusa UP [5534, 5559, 5560, 5563, 5752]
DynamicRingChanger UP [5852, 5874, 5875, 5954]
Pithos UP [5877, 5899, 5900, 5962]
Stargate UP [5902, 5927, 5928, 6103, 6108]
Cerebro UP [5930, 5952, 5953, 6106]
Chronos UP [5960, 6004, 6006, 6075]
Curator UP [5987, 6017, 6018, 6261]
Prism UP [6020, 6042, 6043, 6111, 6818]
CIM UP [6045, 6067, 6068, 6101]
AlertManager UP [6070, 6099, 6100, 6296]
Arithmos UP [6107, 6175, 6176, 6344]
SysStatCollector UP [6196, 6259, 6260, 6497]
Tunnel UP [6263, 6312, 6313]
ClusterHealth UP [6317, 6342, 6343, 6446, 6468, 6469, 6604, 6605,
6606, 6607]
Janus UP [6365, 6444, 6445, 6584]
NutanixGuestTools UP [6377, 6403, 6404]
1. Log on to a running Controller VM in the cluster with SSH (cvm_ip_addr2 from the worksheet).
If successful, this command produces no output. If if it fails, wait 5 minutes and try again.
nutanix@cvm$ ~/serviceability/bin/esx-start-cvm -s cvm_ip_addr
Replace cvm_ip_addr with the IP address of the Controller VM (cvm_ip_addr from the worksheet).
If the Controller VM starts, a message like the following is displayed.
INFO esx-start-cvm:67 CVM started successfully. Please verify using ping cvm_ip_addr
After starting, the Controller VM restarts once. Wait three to four minutes before you ping the Controller
VM.
Alternatively, you can take the ESXi host out of maintenance mode and start the Controller VM using
the vSphere client.
If the cluster is running properly, output similar to the following is displayed for each node in the cluster:
CVM: 10.1.64.60 Up
Zeus UP [3704, 3727, 3728, 3729, 3807, 3821]
Scavenger UP [4937, 4960, 4961, 4990]
SSLTerminator UP [5034, 5056, 5057, 5139]
Hyperint UP [5059, 5082, 5083, 5086, 5099, 5108]
Medusa UP [5534, 5559, 5560, 5563, 5752]
DynamicRingChanger UP [5852, 5874, 5875, 5954]
Pithos UP [5877, 5899, 5900, 5962]
Stargate UP [5902, 5927, 5928, 6103, 6108]
Cerebro UP [5930, 5952, 5953, 6106]
Chronos UP [5960, 6004, 6006, 6075]
Curator UP [5987, 6017, 6018, 6261]
Prism UP [6020, 6042, 6043, 6111, 6818]
CIM UP [6045, 6067, 6068, 6101]
AlertManager UP [6070, 6099, 6100, 6296]
Arithmos UP [6107, 6175, 6176, 6344]
SysStatCollector UP [6196, 6259, 6260, 6497]
Tunnel UP [6263, 6312, 6313]
ClusterHealth UP [6317, 6342, 6343, 6446, 6468, 6469, 6604, 6605,
6606, 6607]
Janus UP [6365, 6444, 6445, 6584]
NutanixGuestTools UP [6377, 6403, 6404]
4. Verify storage.
a. Log on to the ESXi host (hypervisor_ip_addr from the worksheet) with SSH.
Replace cvm_name with the name of the Controller VM that you found from the preceding command.
5. If the node is in maintenance mode, log on to the Controller VM and take the node out of maintenance
mode.
nutanix@cvm$ acli
<acropolis> host.exit_maintenance_mode host
<acropolis> exit
6. Log on to another Controller VM in the cluster with SSH (cvm_ip_addr2 from the worksheet).
If the cluster is running properly, output similar to the following is displayed for each node in the cluster:
CVM: 10.1.64.60 Up
Zeus UP [3704, 3727, 3728, 3729, 3807, 3821]
Scavenger UP [4937, 4960, 4961, 4990]
SSLTerminator UP [5034, 5056, 5057, 5139]
Hyperint UP [5059, 5082, 5083, 5086, 5099, 5108]
1. If the node is turned off, turn it on by pressing the power button on the front. Otherwise, proceed to the
next step.
2. Log on to the Hyper-V host with Remote Desktop Connection and start PowerShell.
6. Log on to another Controller VM in the cluster with SSH (cvm_ip_addr2 from the worksheet).
If the cluster is running properly, output similar to the following is displayed for each node in the cluster:
CVM: 10.1.64.60 Up
Zeus UP [3704, 3727, 3728, 3729, 3807, 3821]
• For all systems: log in to a controller VM and run the command ncc hardware_info
show_hardware_info. Check the Memory Module output to make sure the DIMM location you replaced
now shows a new serial number.
• For systems with BMC firmware 3.24 or 1.87, use the IPMI web interface or the IPMI View tool to verify
that the event log does not display any new DIMM error messages.
• For systems with BMC firmware 2.33: use ipmitool (vSphere or AHV) or ipmiutil (Hyper-V) to verify that
the event log does not display any new DIMM error messages.