Sie sind auf Seite 1von 32

Memory Replacement

Acropolis 4.7

NX-1000/NX-3050/NX-3060/NX-3060-G4/NX-3060-G5/
NX-6000/NX-8035-G4/NX-8035-G5/NX-9000/SX-1065-G5 Series
27-Apr-2017
Notice

Copyright
Copyright 2017 Nutanix, Inc.
Nutanix, Inc.
1740 Technology Drive, Suite 150
San Jose, CA 95110
All rights reserved. This product is protected by U.S. and international copyright and intellectual property
laws. Nutanix is a trademark of Nutanix, Inc. in the United States and/or other jurisdictions. All other marks
and names mentioned herein may be trademarks of their respective companies.

License
The provision of this software to you does not grant any licenses or other rights under any Microsoft
patents with respect to anything other than the file server implementation portion of the binaries for this
software, including no licenses or any other rights in any hardware or any devices or software that are used
to communicate with or in connection with this software.

Conventions
Convention Description

variable_value The action depends on a value that is unique to your environment.

ncli> command The commands are executed in the Nutanix nCLI.

user@host$ command The commands are executed as a non-privileged user (such as nutanix)
in the system shell.

root@host# command The commands are executed as the root user in the vSphere or Acropolis
host shell.

> command The commands are executed in the Hyper-V host shell.

output The information is displayed as output from a command or in a log file.

Default Cluster Credentials


Interface Target Username Password

Nutanix web console Nutanix Controller VM admin admin

vSphere Web Client ESXi host root nutanix/4u

Memory Replacement | AOS | 2


Interface Target Username Password

vSphere client ESXi host root nutanix/4u

SSH client or console ESXi host root nutanix/4u

SSH client or console AHV host root nutanix/4u

SSH client or console Hyper-V host Administrator nutanix/4u

SSH client Nutanix Controller VM nutanix nutanix/4u

IPMI web interface or ipmitool Nutanix node ADMIN ADMIN

SSH client or console Acropolis OpenStack root admin


Services VM (Nutanix
OVM)

Version
Last modified: April 27, 2017 (2017-04-27 2:23:37 GMT-7)

Memory Replacement | AOS | 3


Memory Replacement

Overview
This document describes how to replace hardware components in a Nutanix block.
Tools and supplies required:
• Antistatic wrist strap
• Phillips screwdriver (#2)
Warning:
• All servicing must be done by a qualified service technician. During the procedure, wear
a grounding wrist strap to avoid ESD damage to the component or system. Handle all
components with care: place them on soft, static-free surfaces.
• Coordinate service with the customer for any operation that involves the hypervisor host (ESXi,
Hyper-V or AHV), virtual machines, or Nutanix software.
• If you have locked down the cluster or hypervisor, you must enable ssh access again. See the
Cluster Access Control topic in the Web Console Guide.

Return failed components to Nutanix. Follow the instructions provided on the box for the replacement
parts. If the instructions are missing or you have other questions, call the Nutanix returns department at
1-866-884-7958 in the US, or +1-508-623-1040 international.
Note: Nutanix does not provide a warranty or support services with respect to non-Nutanix
components, and any problems arising from your use of such non-Nutanix components are
expressly excluded from the Nutanix warranty and support terms. If you replace or modify Nutanix
components or install non-Nutanix components you do so at your risk.

Control Panel Overview


Each Nutanix node has one or two control panels in the front of the chassis. The two-node platforms have
separate control panels for node A and node B. On four-node platforms, one panel controls nodes A and B,
and the other panel controls nodes C and D.

Figure: Control panels (ears) for two-node and four-node platforms

Memory Replacement | AOS | 4


Memory Failure

While a Nutanix node may be able to self-correct for certain memory errors, failed memory can lead to
system degradation and should be addressed as soon as possible.

Indications

• For all systems with a Haswell CPU, these tools will display a DIMM error message: POST, the IPMI
web interface, IPMI View, SMCIPMItool, and open source ipmitool 1.8.14.
• For systems using BMC firmware 1.87 and BIOS 1.0b, or BMC 3.24 and BIOS 3.0.5, the IPMI web
interface for a node shows a message similar to Uncorrectable ECC @ DIMM1A(CPU1) - Asserted.
• For systems using BMC firmware 2.33 and BIOS 3.0.2, ipmitool or ipmiutil displays a DIMM error
message.
Match the CPU and DIMM ID reported by the IPMI event log with the DIMM diagram. See Identifying a
Failed Memory Module.

Supported Memory Configurations


This chapter shows DIMM installation order for all Nutanix G4 and G5 platforms. Use the rules and
guidelines in this topic to remove, replace, or add memory.

Balanced and Unbalanced Configurations

Memory performance is most efficient with a balanced configuration, where every memory channel
contains the same number of DIMMs. Unbalanced configurations are supported, but be aware that these
configurations result in lower performance.

Memory Installation Order For All Platforms

A memory channel is a group of DIMM slots.


Each CPU is associated with four memory channels. Memory channels contain either two or three DIMM
slots, depending on the motherboard.
Two DIMMs per channel (2DPC) memory channels have one blue slot and one black slot each, as shown
in the figure below:

Platforms with two DIMMs per channel (2DPC)

NX-1065-G4/-G5 NX-1065S-G5 NX-3060-G4/-G5

NX-6035-G4/-G5 NX-6035C-G5 NX-8035-G4

NX-9030-G5 NX-9060-G4 SX-1065-G5

Memory Replacement | AOS | 5


Figure: Motherboard with 2DPC Memory Channels

Three DIMMs per channel (3DPC) memory channels have one blue slot and two black slots each, as
shown in the figure below:

Platforms with three DIMMs per channel (3DPC)

NX-1155-G5 NX-3155G-G4/-G5 NX-3175-G4/-G5

NX-6155-G5 NX-8150-G4/-G5

Figure: Motherboard with 3DPC Memory Channels

All occupied slots in a memory channel must contain only DIMMs from the same manufacturer. This means
that if you are replacing a DIMM in a channel, or upgrading your memory by adding new DIMMs to a
channel, you must do one of the following:
• Make sure the DIMMs you are replacing or adding are of the same type, capacity, and manufacturer as
the other DIMMs in the channel, or

Memory Replacement | AOS | 6


• Remove all DIMMs from the channel and repopulate the channel with new DIMMs that are all of the
same type, capacity, and manufacturer.
Note: DIMM slots on the motherboard are most commonly labeled as A1, A2, and so on.
However, some software tools may report DIMM slot labels in a different format, such as 1A, 2A, or
CPU1, CPU2, or DIMM1, DIMM2.

Number of Number of DIMMs Slots to Use


CPUs

1 2 (Unbalanced) A1, B1 (blue slots)

1 4 A1, B1, C1, D1 (blue slots)

1 6 (Unbalanced) A1, B1, C1, D1 (blue slots)


A2, B2 (black slots)

1 8 A1, B1, C1, D1 (blue slots)


A2, B2, C2, D2 (black slots)

2 4 (Unbalanced) A1, B1, E1, F1 (blue slots)

2 6 (Unbalanced) A1, B1, C1, E1, F1, G1 (blue slots)

2 8 A1, B1, C1, D1, E1, F1, G1, H1 (blue slots)

2 12 (Unbalanced) A1, B1, C1, D1, E1, F1, G1, H1 (blue slots)
A2, B2, E2, F2 (black slots)

2 16 A1, B1, C1, D1, E1, F1, G1, H1 (blue slots)


A2, B2, C2, D2, E2, F2, G2, H2 (black slots)

2 20 (Unbalanced) A1, B1, C1, D1, E1, F1, G1, H1 (blue slots)
A2, B2, C2, D2, E2, F2, G2, H2 (black slots)
A3, B3, E3, F3 (black slots)

2 24 Fill all slots.

DIMM Restrictions

DIMM population must follow certain rules:


• Within a node, all DIMM slots must be populated with DIMMs of the same type and capacity. For
example, RDIMMs and LRDIMMs cannot be mixed in the same node.
• Within each memory channel, all DIMMs must come from the same manufacturer.

Memory Replacement | AOS | 7


DIMM Slot Examples

Figure: Single-Socket CPU DIMM Configurations

Figure: Balanced 2DPC DIMM Configurations

Figure: Unbalanced 2DPC DIMM Configurations

Memory Replacement | AOS | 8


Figure: 3DPC DIMM Configurations

Memory Replacement | AOS | 9


Summary: Replacing Memory
Before you begin: Log on to any Controller VM in the cluster. Run the breakfix.py script to gather the
information you need to replace the failed component.
nutanix@cvm$ ~/serviceability/bin/breakfix.py --mem --ip=cvm_ip_addr

Replace cvm_ip_addr with the IP address of the Controller VM with the failed DIMM.

Tip: Save the output of this script for reference while you are performing the replacement
procedure. This parameter list is referred to in the procedure as the "worksheet".

1. Print and save a list of DIMM slots and the serial numbers of the DIMMs in those slots by following
Checking Memory Locations on page 11.

2. Identify the failed DIMM by following the CPU-specific procedure.


→ Ivybridge and Sandybridge CPUs: Identifying a Failed Memory Module (Ivy Bridge and Sandy Bridge
CPU Platforms) on page 11
→ Haswell CPUs on G4 platforms: Identifying a Failed Memory Module (G4 Intel Haswell and G5 Intel
Broadwell CPU Platforms) on page 13

3. Shut down the node by following the hypervisor-specific procedure.


→ vSphere (vCenter): Shutting Down a Node in a Cluster (vSphere client) on page 14
→ vSphere (command line): Shutting Down a Node in a Cluster (vSphere command line) on
page 15
→ AHV: Shutting Down a Node in a Cluster (AHV) on page 17
→ Hyper-V: Shutting Down a Node in a Cluster (Hyper-V) on page 18

4. Turn on the chassis identifier lights and disconnect the cables by following Disconnecting the Cables on
page 18.

5. Remove the node from the block by following Physically Removing a Node from a Block on page 22.

6. Replace the failed memory module by following Replacing a Memory Module on page 23.

7. Place the node back in the block by following Physically Inserting a Node in a Block on page 26.

8. Start the node by following the hypervisor-specific procedure.


→ vSphere (vCenter): Starting a Node in a Cluster (vSphere client) on page 27
→ vSphere (command line): Starting a Node in a Cluster (vSphere command line) on page 29
→ AHV: Starting a Node in a Cluster (AHV) on page 30
→ Hyper-V: Starting a Node in a Cluster (Hyper-V) on page 31

9. Verify that the memory failure is resolved by following Verifying Memory Replacement on page 32.

Memory Replacement | AOS | 10


Detailed Procedures

Checking Memory Locations


Use Nutanix Cluster Check (NCC) to print and save a list of DIMM slots and the serial numbers of the
DIMMs in those slots.
Estimated time to complete: 2 minutes

1. Log in to a controller VM and run the command ncc hardware_info show_hardware_info.


The output resembles the following:

Memory Module

+--------------------------------------------------------------+
| Location | P1-DIMMA1 |
| Bank connection | P0_Node0_Channel0_Dimm0 |
| Capable speed | 2400 MHz |
| Current speed | 2133 MHz |
| Installed size | 16384 MB |
| Manufacturer | Samsung |
| Product part number | M393A2K40BB1-CRC |
| Serial number | 745FDB84 |
| Type | DDR4 |
+--------------------------------------------------------------+
| Location | P1-DIMMA2 |
| Bank connection | P0_Node0_Channel0_Dimm1 |
| Capable speed | 2400 MHz |
| Current speed | 2133 MHz |
| Installed size | 16384 MB |
| Manufacturer | Samsung |
| Product part number | M393A2K40BB1-CRC |
| Serial number | 745FDBBD |
| Type | DDR4 |
+--------------------------------------------------------------+
| Location | P1-DIMMA3 |
| Bank connection | P0_Node0_Channel0_Dimm2 |
| Status | No DIMM |
+--------------------------------------------------------------+

2. Save the output so you can compare it with output from the same command after you have replaced a
DIMM.

Identifying a Failed Memory Module (Ivy Bridge and Sandy Bridge CPU Platforms)
Who is responsible: This procedure should be performed by a service technician under contract with
Nutanix.
All DIMM slots in a node must be populated with DIMMs of the same type and capacity. For systems with
Ivy Bridge and Sandy Bridge CPUs, you must know the firmware version to identify a failed DIMM. The
firmware version determines which tool will correctly identify the failed DIMM, either the IPMI web interface,
or the IPMI sel list command from the host.

Memory Replacement | AOS | 11


Estimated time to complete: 5 minutes

1. Determine the firmware version from the main IPMI web interface window.

Figure: Firmware revision displayed in the ipmi web interface

2. Determine which DIMM has failed.


→ If the system fails to boot, contact Nutanix support.
→ For systems with BMC firmware 3.24, or 1.87, you must use the IPMI web interface or the IPMI View
tool.
The IPMI event log for a node shows a message similar to Uncorrectable ECC @ DIMM1A(CPU1) -
Asserted.

→ For systems using BMC firmware 2.33, you must use ipmitool or ipmiutil from the host to accurately
identify the failed DIMM.
vSphere example
root@esx# /ipmitool sel list

1 | 12/02/2013 | 17:37:24 | Memory | Correctable ECC | Asserted | CPU1 DIMM1


2 | 12/02/2013 | 17:37:39 | Memory | Correctable ECC | Asserted | CPU1 DIMM1

AHV example
root@ahv# ipmitool sel list

1 | 12/02/2013 | 17:37:24 | Memory | Correctable ECC | Asserted | CPU1 DIMM1


2 | 12/02/2013 | 17:37:39 | Memory | Correctable ECC | Asserted | CPU1 DIMM1

Hyper-V example
> ipmiutil sel list -e

RecId Date/Time_______ SEV Src_ Evt_Type___ Sens# Evt_detail - Trig [Evt_data]


7840 06/02/13 18:15:31 MIN BMC Memory #08 Uncorrectable ECC, DIMM1/CPU1 6f [20 ff 10]

3. Match the CPU and DIMM ID failure information with the DIMM diagram.

Memory Replacement | AOS | 12


Figure: NX-1000, NX-3050, NX-6000 and NX-9040 series DIMM Slot IDs

Identifying a Failed Memory Module (G4 Intel Haswell and G5 Intel Broadwell CPU
Platforms)
Who is responsible: This procedure should be performed by a service technician under contract with
Nutanix.
All platforms with Haswell CPUs have the suffix -G4 in the product name: for example, NX-1065-G4 or
NX-3175-G4.
If the system is running, errors are reported. For all systems with a Haswell CPU and any BIOS and BMC
combination, these tools will display a DIMM error message:
• IPMI web interface
• IPMI View
• SMCIPMItool
• Open source ipmitool 1.8.14
Estimated time to complete: 5 minutes

1. Determine which DIMM has failed.


→ In the IPMI web interface for the node with the failed DIMM, select Server Health > Event Log.
The IPMI event log for the node shows a message similar to Uncorrectable ECC @ DIMM1A(CPU1) -
Asserted.

Read the memory failure message in the event log and make a note of the memory location shown
in the event.

→ Alternatively, with BIOS G4-1.1 a failed DIMM will prevent the node from booting. If the node fails to
boot, the failed DIMM ID is displayed in POST.

2. Match the CPU and DIMM ID failure information with the DIMM diagram.

Memory Replacement | AOS | 13


Figure: NX-1065-G4, NX-3060-G4, NX-6035-G4, NX-8035-G4, NX-9060-G4 DIMM slot IDs

3. Identify the type of DIMM that is installed.


Note: All DIMM slots in a node must be populated with DIMMs of the same type and capacity.
Platforms with Haswell CPUs support both 32 GB LR DIMMs and R DIMMs. The LR and R
DIMMs can not be mixed in a node. You must know the type of DIMM installed in the node so
that you can order the correct replacement DIMM. The DIMM part number indicates the DIMM
type. You can identify the DIMM with IPMI or by the part number label on the DIMM.

→ If you have already powered off the node, find the part number label on the DIMM.
→ In IPMI, select System > Hardware Information> DIMM and select the failed DIMM to view the part
number.

Shutting Down a Node in a Cluster (vSphere client)


Before you begin: Shut down guest VMs (including vCenter) that are running on the node, or move them
to other nodes in the cluster.

Memory Replacement | AOS | 14


Who is responsible: This procedure should be performed by a Nutanix support engineer or by a customer
or partner under the guidance of a Nutanix support engineer.
Caution: You can only shut down one node for each cluster. If the cluster would have more than
one node shut down, shut down the entire cluster.
Estimated time to complete: 15 minutes

1. Log on to vCenter (or to the ESXi host if vCenter is not available) with the vSphere client.

2. Right-click the host and select Enter Maintenance Mode.

3. Warning: The next step affects the operation of a Nutanix cluster. Coordinate down time with
the customer before performing the step.
Log on to the Controller VM with SSH and shut down the Controller VM.
nutanix@cvm$ cvm_shutdown -P now

Note: Do not power off, reset, or shutdown the Controller VM in any way other than the
cvm_shutdown command to ensure that the cluster is aware that the Controller VM is unavailable

4. In the Confirm Maintenance Mode dialog box, uncheck Move powered off and suspended virtual
machines to other hosts in the cluster and click Yes.
The host is placed in maintenance mode, which prevents VMs from running on the host.

5. Right-click the host and select Shut Down.


Wait until vCenter shows that the host is not responding, which may take several minutes.

If you are logged on to the ESXi host rather than to vCenter, the vSphere client disconnects when the
host shuts down.

Shutting Down a Node in a Cluster (vSphere command line)


Before you begin: Shut down guest VMs (including vCenter) that are running on the node, or move them
to other nodes in the cluster.
Who is responsible: This procedure should be performed by a Nutanix support engineer or by a customer
or partner under the guidance of a Nutanix support engineer.

Caution: You can only shut down one node for each cluster. If the cluster would have more than
one node shut down, shut down the entire cluster.
Estimated time to complete: 10 minutes

Memory Replacement | AOS | 15


You can put the ESXi host into maintenance mode and shut it down from the command line or by using the
vSphere client.

1. Warning: The next step affects the operation of a Nutanix cluster. Coordinate down time with
the customer before performing the step.
Log on to the Controller VM with SSH and shut down the Controller VM.
nutanix@cvm$ cvm_shutdown -P now

Note:
Always use the cvm_shutdown command to power off, reset, or shutdown the Controller VM. The
cvm_shutdown command notifies the cluster that the Controller VM is unavailable.

2. Log on to another Controller VM in the cluster with SSH (cvm_ip_addr2 from the worksheet).

3. Shut down the host.


nutanix@cvm$ ~/serviceability/bin/esx-enter-maintenance-mode -s cvm_ip_addr

If successful, this command returns no output. If it fails with a message like the following, VMs are
probably still running on the host.

CRITICAL esx-enter-maintenance-mode:42 Command vim-cmd hostsvc/maintenance_mode_enter


failed with ret=-1

Ensure that all VMs are shut down or moved to another host and try again before proceeding.
nutanix@cvm$ ~/serviceability/bin/esx-shutdown -s cvm_ip_addr

Replace cvm_ip_addr with the IP address of the Controller VM on the ESXi host (cvm_ip_addr from the
worksheet).
Alternatively, you can put the ESXi host into maintenance mode and shut it down using the vSphere
client.
If the host shuts down, a message like the following is displayed.

INFO esx-shutdown:67 Please verify if ESX was successfully shut down using
ping hypervisor_ip_addr

4. Turn on the chassis identifier lights on the front and back of the node that you will remove using one of
the following options.
→ Log on to a Controller VM, and issue the following command.
nutanix@cvm$ ~/serviceability/bin/chassis-identify -s cvm_ip_addr -t 240

Replace cvm_ip_addr with the Controller VM IP address for the node that you will remove
(cvm_ip_addr from the worksheet).

→ In the IPMI console, go to Miscellaneous > UID, select TURN ON, and click SAVE.
→ Log on to the hypervisor host (hypervisor_ip_addr from the worksheet) with SSH and issue the
following command.
root@esx# /ipmitool chassis identify 240

This command returns the following:

Chassis identify interval: 240 seconds

5. Confirm that the ESXi host has shut down.


nutanix@cvm$ ping hypervisor_ip_addr

Memory Replacement | AOS | 16


Replace hypervisor_ip_addr with the IP address of the ESXi host (hypervisor_ip_addr from the
worksheet).
If no ping packets are answered, the ESXi host is shut down.

Shutting Down a Node in a Cluster (AHV)


Before you begin: Shut down guest VMs that are running on the node, or move them to other nodes in
the cluster.
Who is responsible: This procedure should be performed by a Nutanix support engineer or by a customer
or partner under the guidance of a Nutanix support engineer.

Caution: You can only shut down one node for each cluster. If the cluster would have more than
one node shut down, shut down the entire cluster.
Estimated time to complete: 15 minutes

1. Warning: The next step affects the operation of a Nutanix cluster. Coordinate down time with
the customer before performing the step.

If the Controller VM is running, shut down the Controller VM.

a. Log on to the Controller VM (cvm_ip_addr from the worksheet) with SSH.

b. Put the node into maintenance mode.


nutanix@cvm$ acli host.enter_maintenance_mode host_ID [wait="{ true | false }" ]

Replace host_ID with the host ID of the host.


Specify wait=true to wait for the host evacuation attempt to finish.

c. Shut down the Controller VM.


nutanix@cvm$ cvm_shutdown -P now

2. Turn on the chassis identifier lights on the front and back of the node that you will remove using one of
the following options.
→ Log on to a Controller VM, and issue the following command.
nutanix@cvm$ ~/serviceability/bin/chassis-identify -s cvm_ip_addr -t 240

Replace cvm_ip_addr with the Controller VM IP address for the node that you will remove
(cvm_ip_addr from the worksheet).

→ In the IPMI console, go to Miscellaneous > UID, select TURN ON, and click SAVE.
→ Log on to the hypervisor host (hypervisor_ip_addr from the worksheet) with SSH and issue the
following command.
root@ahv# /usr/bin/ipmitool chassis identify 240

This command returns the following:

Chassis identify interval: 240 seconds

3. Log on to the Acropolis host with SSH.

Memory Replacement | AOS | 17


4. Shut down the host.
root@ahv# shutdown -h now

Shutting Down a Node in a Cluster (Hyper-V)


Before you begin: Shut down guest VMs that are running on the node, or move them to other nodes in
the cluster.
Who is responsible: This procedure should be performed by a Nutanix support engineer or by a customer
or partner under the guidance of a Nutanix support engineer.

Caution: You can only shut down one node for each cluster. If the cluster would have more than
one node shut down, shut down the entire cluster.
Estimated time to complete: 15 minutes

1. Warning: The next step affects the operation of a Nutanix cluster. Coordinate down time with
the customer before performing the step.
Log on to the Controller VM with SSH and shut down the Controller VM.
nutanix@cvm$ cvm_shutdown -P now

Note:
Always use the cvm_shutdown command to power off, reset, or shutdown the Controller VM. The
cvm_shutdown command notifies the cluster that the Controller VM is unavailable.

2. Turn on the chassis identifier lights on the front and back of the node that you will remove using one of
the following options.
→ In the IPMI console, go to Miscellaneous > UID, select TURN ON, and click SAVE.
→ Log on to the hypervisor host (hypervisor_ip_addr from the worksheet) using RDP or the IPMI
console and issue the following command.
> ipmiutil alarms -i 240

This command returns the following:

ipmiutil ver 2.91


ialarms ver 2.91
-- BMC version 2.33, IPMI version 2.0
Setting ID LED to 240 ...
ipmiutil alarms, completed successfully

3. Log on to the Hyper-V host with Remote Desktop Connection and start PowerShell.

4. Shut down the host.


> shutdown /i

Disconnecting the Cables


Who is responsible: This procedure should be performed by a service technician under contract with
Nutanix.

Memory Replacement | AOS | 18


Estimated time to complete: 5 minutes

1. Make a note of the cable configurations. You will need to replace the cables in the same order.

2. Disconnect the power cables. If necessary, disconnect all the other cables plugged into the node.

Figure: Disconnecting cables (NX-3050 shown)

Figure: Back panel of NX-1050, NX-3050 and NX-3060 series blocks

Memory Replacement | AOS | 19


Figure: Back panel of NX-1065-G4/G5, NX-3060-G4/G5, and the NX-9060-G4 block

Figure: Back panel of NX-1020 block and NX-1065S block (1 GbE NIC shown)

Memory Replacement | AOS | 20


Figure: Back panel of NX-6035-G4/G5 blocks

Figure: Back panel for NX-6000 series and NX-9040 blocks

Memory Replacement | AOS | 21


Figure: Back panel for NX-8035-G4/G5 block

Physically Removing a Node from a Block


Who is responsible: This procedure should be performed by a service technician under contract with
Nutanix.
Caution: For proper cooling and airflow, do not operate the system with the node removed unless
explicitly instructed to do so. Do not leave the chassis with a node removed or a system power
supply removed for more than 10 minutes, as system cooling could be compromised.

Estimated time to complete: 5 minutes

1. Make a note of the cable configurations. You will need to replace the cables in the same order.

2. Disconnect any cables plugged into the node.

3. Squeeze both latches inward simultaneously, grasp the handles, and pull the node from the Nutanix
block.

Memory Replacement | AOS | 22


Figure: Four-node block

Figure: Two-node block

Replacing a Memory Module


Who is responsible: This procedure should be performed by a service technician under contract with
Nutanix.
Estimated time to complete: 3 minutes
Caution: Be careful when removing and replacing DIMMs to avoid damage to other components.
Some DIMM connectors are very close to others. On some platforms, you must close all the
latches on one side before opening latches on the adjacent side.

All DIMM slots must be populated with DIMMs of the same type and capacity. Within each memory
channel, all DIMMs must have the same part number.

1. Prepare to replace the DIMMs.

Memory Replacement | AOS | 23


→ (NX-1000, NX-3050) If you are replacing one of the DIMMs next to an air shroud (1F, 2F, 1H, 2H),
gently unhook the air shroud from the heat sink and lift the shroud out of the way. Do not unscrew
the shroud from the motherboard.

Figure: Air shroud for NX-1000 and NX-3050 series


→ (NX-1000, NX-3050) If you are replacing DIMM 1A or 2A, remove the two screws that secure the
NIC, and remove the NIC before opening the DIMM latches.

Figure: NIC removal for NX-1000, NX-3050, NX-6000 and NX-9040 series
→ (NX-1065-G4/G5, NX-3060-G4/G5, NX-9060-G4) If you are replacing a DIMM that is covered by the
air shroud, remove the two screws from the air shroud and unhook the air shroud from the CPU heat
sink.

Memory Replacement | AOS | 24


Figure: NX-1065-G4/G5, NX-3060-G4/G5, and NX-9060-G4 air shroud and DIMM removal
→ (NX-6035-G4/G5 and NX-8035-G4/G5) Remove the screw that secures one side of the air shroud if
it interferes with any DIMM removal.

Caution: Bending the heat shroud without removing the screw might damage the air shroud
and affect cooling. Always remove the screw, and move the air shroud.

Figure: NX-6035-G4/G5 and NX-8035-G4/G5 air shroud release and DIMM removal

2. Identify the slot that contains the failed memory module by correlating the memory failure message with
the DIMM ID on the serverboard illustration.

3. Press the latches that hold the failed memory module in place.

4. Remove the failed memory module.

Memory Replacement | AOS | 25


Figure: DIMM removal

5. Insert the replacement memory module into the same slot and push it down until the latches snap in
place.

Important: It is critical that you insert the replacement memory into the same slot. The
serverboard may not recognize the memory if you insert it into a different slot.
Take care to align the notch in the bottom of the memory module with the receptive key point on the
memory slot.

6. Replace the air shroud.


→ (NX-1000, NX-3050) If you detached the flexible air shroud, re-attach the shroud to the first fin of the
heat sink.
→ (NX-1065-G4/G5, NX-3060-G4/G5, and NX-9060-G4) If you removed the air shroud, hook the
shroud onto the CPU heat sink, and replace the two screws.
→ Caution: Ensure that the shroud is replaced correctly. If the shroud is too high when you
replace it, the shroud can deform when you push the node into the chassis, which might
compromise cooling.
→ (NX-6035-G4/G5 and NX-8035-G4/G5) Align the shroud and replace the screw that secures the air
shroud.

7. (NX-1000, NX-3050) If you removed the NIC, carefully replace the NIC and two screws.

Physically Inserting a Node in a Block


Who is responsible: This procedure should be performed by a service technician under contract with
Nutanix.
Estimated time to complete: 5 minutes

Caution: Do not stack other objects on top of a chassis. The weight can deform the chassis.
Components might be damaged when nodes slide in or out of the chassis.

Note: Nodes with Intel Broadwell -G5 CPUs and Intel Haswell - G4 CPUs cannot be mixed in the
same block.

1. Push the node into the block until the node is approximately two inches from full insertion.

Caution: Do not apply too much force to insert the node. The backplane connector can be
damaged.

Memory Replacement | AOS | 26


Figure: Backplane and node connectors (NX-8035-G4/G5)

2. Push the node in slowly until the latches engage.

3. Connect the cables.

4. Turn on the node by pressing the power button on the front of the node (the top button).

Figure: Power button and left control panel (ear)

The top power light illuminates and the fans are audibly louder for approximately 2 minutes.

Starting a Node in a Cluster (vSphere client)


Who is responsible: This procedure should be performed by a Nutanix support engineer or by a customer
or partner under the guidance of a Nutanix support engineer.

Memory Replacement | AOS | 27


Estimated time to complete: 10 minutes

1. If the node is turned off, turn it on by pressing the power button on the front. Otherwise, proceed to the
next step.

2. Log on to vCenter (or to the node if vCenter is not running) with the vSphere client.

3. Right-click the ESXi host and select Exit Maintenance Mode.

4. Right-click the Controller VM and select Power > Power on.


Wait approximately 5 minutes for all services to start on the Controller VM.
The top power LED illuminates and the fans are noticeably louder for approximately 2 minutes.

5. Log on to another Controller VM in the cluster with SSH (cvm_ip_addr2 from the worksheet).

6. Confirm that cluster services are running on the Controller VM.


nutanix@cvm$ ncli cluster status | grep -A 15 cvm_ip_addr

Output similar to the following is displayed.


Name : 10.1.56.197
Status : Up
... ...
StatsAggregator : up
SysStatCollector : up

Every service listed should be up.

7. Right-click the ESXi host in the vSphere client and select Rescan for Datastores. Confirm that all
Nutanix datastores are available.

8. Verify that all services are up on all Controller VMs.


nutanix@cvm$ cluster status

If the cluster is running properly, output similar to the following is displayed for each node in the cluster:

CVM: 10.1.64.60 Up
Zeus UP [3704, 3727, 3728, 3729, 3807, 3821]
Scavenger UP [4937, 4960, 4961, 4990]
SSLTerminator UP [5034, 5056, 5057, 5139]
Hyperint UP [5059, 5082, 5083, 5086, 5099, 5108]
Medusa UP [5534, 5559, 5560, 5563, 5752]
DynamicRingChanger UP [5852, 5874, 5875, 5954]
Pithos UP [5877, 5899, 5900, 5962]
Stargate UP [5902, 5927, 5928, 6103, 6108]
Cerebro UP [5930, 5952, 5953, 6106]
Chronos UP [5960, 6004, 6006, 6075]
Curator UP [5987, 6017, 6018, 6261]
Prism UP [6020, 6042, 6043, 6111, 6818]
CIM UP [6045, 6067, 6068, 6101]
AlertManager UP [6070, 6099, 6100, 6296]
Arithmos UP [6107, 6175, 6176, 6344]
SysStatCollector UP [6196, 6259, 6260, 6497]
Tunnel UP [6263, 6312, 6313]
ClusterHealth UP [6317, 6342, 6343, 6446, 6468, 6469, 6604, 6605,
6606, 6607]
Janus UP [6365, 6444, 6445, 6584]
NutanixGuestTools UP [6377, 6403, 6404]

Memory Replacement | AOS | 28


Starting a Node in a Cluster (vSphere command line)
Who is responsible: This procedure should be performed by a service technician under contract with
Nutanix.
Estimated time to complete: 10 minutes

1. Log on to a running Controller VM in the cluster with SSH (cvm_ip_addr2 from the worksheet).

2. Start the Controller VM.


nutanix@cvm$ ~/serviceability/bin/esx-exit-maintenance-mode -s cvm_ip_addr

If successful, this command produces no output. If if it fails, wait 5 minutes and try again.
nutanix@cvm$ ~/serviceability/bin/esx-start-cvm -s cvm_ip_addr

Replace cvm_ip_addr with the IP address of the Controller VM (cvm_ip_addr from the worksheet).
If the Controller VM starts, a message like the following is displayed.

INFO esx-start-cvm:67 CVM started successfully. Please verify using ping cvm_ip_addr

After starting, the Controller VM restarts once. Wait three to four minutes before you ping the Controller
VM.
Alternatively, you can take the ESXi host out of maintenance mode and start the Controller VM using
the vSphere client.

3. Verify that all services are up on all Controller VMs.


nutanix@cvm$ cluster status

If the cluster is running properly, output similar to the following is displayed for each node in the cluster:

CVM: 10.1.64.60 Up
Zeus UP [3704, 3727, 3728, 3729, 3807, 3821]
Scavenger UP [4937, 4960, 4961, 4990]
SSLTerminator UP [5034, 5056, 5057, 5139]
Hyperint UP [5059, 5082, 5083, 5086, 5099, 5108]
Medusa UP [5534, 5559, 5560, 5563, 5752]
DynamicRingChanger UP [5852, 5874, 5875, 5954]
Pithos UP [5877, 5899, 5900, 5962]
Stargate UP [5902, 5927, 5928, 6103, 6108]
Cerebro UP [5930, 5952, 5953, 6106]
Chronos UP [5960, 6004, 6006, 6075]
Curator UP [5987, 6017, 6018, 6261]
Prism UP [6020, 6042, 6043, 6111, 6818]
CIM UP [6045, 6067, 6068, 6101]
AlertManager UP [6070, 6099, 6100, 6296]
Arithmos UP [6107, 6175, 6176, 6344]
SysStatCollector UP [6196, 6259, 6260, 6497]
Tunnel UP [6263, 6312, 6313]
ClusterHealth UP [6317, 6342, 6343, 6446, 6468, 6469, 6604, 6605,
6606, 6607]
Janus UP [6365, 6444, 6445, 6584]
NutanixGuestTools UP [6377, 6403, 6404]

4. Verify storage.

a. Log on to the ESXi host (hypervisor_ip_addr from the worksheet) with SSH.

Memory Replacement | AOS | 29


b. Rescan for datastores.
root@esx# esxcli storage core adapter rescan --all

c. Confirm that cluster VMFS datastores, if any, are available.


root@esx# esxcfg-scsidevs -m | awk '{print $5}'

Starting a Node in a Cluster (AHV)


Who is responsible: This procedure should be performed by a service technician under contract with
Nutanix.
Estimated time to complete: 10 minutes

1. Log on to the Acropolis host with SSH.

2. Find the name of the Controller VM.


root@ahv# virsh list --all | grep CVM

Make a note of the Controller VM name in the second column.

3. Determine if the Controller VM is running.


• If the Controller VM is off, a line similar to the following should be returned:
- NTNX-12AM2K470031-D-CVM shut off

Make a note of the Controller VM name in the second column.

• If the Controller VM is on, a line similar to the following should be returned:


- NTNX-12AM2K470031-D-CVM running

4. If the Controller VM is shut off, start it.


root@ahv# virsh start cvm_name

Replace cvm_name with the name of the Controller VM that you found from the preceding command.

5. If the node is in maintenance mode, log on to the Controller VM and take the node out of maintenance
mode.
nutanix@cvm$ acli
<acropolis> host.exit_maintenance_mode host
<acropolis> exit

6. Log on to another Controller VM in the cluster with SSH (cvm_ip_addr2 from the worksheet).

7. Verify that all services are up on all Controller VMs.


nutanix@cvm$ cluster status

If the cluster is running properly, output similar to the following is displayed for each node in the cluster:

CVM: 10.1.64.60 Up
Zeus UP [3704, 3727, 3728, 3729, 3807, 3821]
Scavenger UP [4937, 4960, 4961, 4990]
SSLTerminator UP [5034, 5056, 5057, 5139]
Hyperint UP [5059, 5082, 5083, 5086, 5099, 5108]

Memory Replacement | AOS | 30


Medusa UP [5534, 5559, 5560, 5563, 5752]
DynamicRingChanger UP [5852, 5874, 5875, 5954]
Pithos UP [5877, 5899, 5900, 5962]
Stargate UP [5902, 5927, 5928, 6103, 6108]
Cerebro UP [5930, 5952, 5953, 6106]
Chronos UP [5960, 6004, 6006, 6075]
Curator UP [5987, 6017, 6018, 6261]
Prism UP [6020, 6042, 6043, 6111, 6818]
CIM UP [6045, 6067, 6068, 6101]
AlertManager UP [6070, 6099, 6100, 6296]
Arithmos UP [6107, 6175, 6176, 6344]
SysStatCollector UP [6196, 6259, 6260, 6497]
Tunnel UP [6263, 6312, 6313]
ClusterHealth UP [6317, 6342, 6343, 6446, 6468, 6469, 6604, 6605,
6606, 6607]
Janus UP [6365, 6444, 6445, 6584]
NutanixGuestTools UP [6377, 6403, 6404]

Starting a Node in a Cluster (Hyper-V)


Who is responsible: This procedure should be performed by a service technician under contract with
Nutanix.
Estimated time to complete: 10 minutes

1. If the node is turned off, turn it on by pressing the power button on the front. Otherwise, proceed to the
next step.

2. Log on to the Hyper-V host with Remote Desktop Connection and start PowerShell.

3. Determine if the Controller VM is running.


> Get-VM | Where {$_.Name -match 'NTNX.*CVM'}

• If the Controller VM is off, a line similar to the following should be returned:


NTNX-13SM35230026-C-CVM Stopped - - - Opera...

Make a note of the Controller VM name in the second column.

• If the Controller VM is on, a line similar to the following should be returned:


NTNX-13SM35230026-C-CVM Running 2 16384 05:10:51 Opera...

4. Start the Controller VM.


> Start-VM -Name NTNX-*CVM

5. Confirm that the containers are available.


> Get-Childitem \\shared_host_name\container_name

6. Log on to another Controller VM in the cluster with SSH (cvm_ip_addr2 from the worksheet).

7. Verify that all services are up on all Controller VMs.


nutanix@cvm$ cluster status

If the cluster is running properly, output similar to the following is displayed for each node in the cluster:

CVM: 10.1.64.60 Up
Zeus UP [3704, 3727, 3728, 3729, 3807, 3821]

Memory Replacement | AOS | 31


Scavenger UP [4937, 4960, 4961, 4990]
SSLTerminator UP [5034, 5056, 5057, 5139]
Hyperint UP [5059, 5082, 5083, 5086, 5099, 5108]
Medusa UP [5534, 5559, 5560, 5563, 5752]
DynamicRingChanger UP [5852, 5874, 5875, 5954]
Pithos UP [5877, 5899, 5900, 5962]
Stargate UP [5902, 5927, 5928, 6103, 6108]
Cerebro UP [5930, 5952, 5953, 6106]
Chronos UP [5960, 6004, 6006, 6075]
Curator UP [5987, 6017, 6018, 6261]
Prism UP [6020, 6042, 6043, 6111, 6818]
CIM UP [6045, 6067, 6068, 6101]
AlertManager UP [6070, 6099, 6100, 6296]
Arithmos UP [6107, 6175, 6176, 6344]
SysStatCollector UP [6196, 6259, 6260, 6497]
Tunnel UP [6263, 6312, 6313]
ClusterHealth UP [6317, 6342, 6343, 6446, 6468, 6469, 6604, 6605,
6606, 6607]
Janus UP [6365, 6444, 6445, 6584]
NutanixGuestTools UP [6377, 6403, 6404]

Verifying Memory Replacement


Who is responsible: This procedure should be performed by a service technician under contract with
Nutanix.
Estimated time to complete: 2 minutes

• For all systems: log in to a controller VM and run the command ncc hardware_info
show_hardware_info. Check the Memory Module output to make sure the DIMM location you replaced
now shows a new serial number.

• For systems with BMC firmware 3.24 or 1.87, use the IPMI web interface or the IPMI View tool to verify
that the event log does not display any new DIMM error messages.

• For systems with BMC firmware 2.33: use ipmitool (vSphere or AHV) or ipmiutil (Hyper-V) to verify that
the event log does not display any new DIMM error messages.

Memory Replacement | AOS | 32

Das könnte Ihnen auch gefallen