Beruflich Dokumente
Kultur Dokumente
Solaris Cluster Troubleshooting Private Interconnect Transport Network in a 'failed' or 'faulted' State (Doc
ID 1315772.1)
APPLIES TO:
PURPOSE
Solaris Cluster has a minimum of two private transports also known as private interconnect or private network for high availability.
This article applies to situations when you have one path in 'failed' or 'faulted' state.
If all private transport in failed/faulted state it is more likely that the node they are connected to is down or not part of the cluster.
#scstat
or
#scstat -W
# cluster status
or
# clinterconnect status
This procedure uses nxge interfaces but it is not specific to nxge and applies to other types of transports. The article shows Solaris Cluster
3.2 and above command set. For older revisions see man pages for scconf and scsetup.
This resolution path can also be used to analyze cluster interconnect issue, when the speed is not correct, e.g. the interconnect is running
on 100Mbit instead of 1000Mbit.
TROUBLESHOOTING STEPS
1) Check current status of cluster and it's interconnect and monitor outputs:
Run the status command mentioned above "scstat -W" or "clintr status". Within execution of these command monitor the messages file
with:
# tail -f /var/adm/messages
1 of 3 12-03-2019, 01:10
Document 1315772.1 https://support.oracle.com/epmos/faces/DocumentDisplay?_adf.ctrl-state...
In this example errors are seen from in.mpathd as well as cluster interconnects are faulted:
# clinterconnect status
The following errors are observed in /var/adm/messages for the public network:
Feb 25 12:22:10 clnode1 in.mpathd[198]: [ID 215189 daemon.error] The link has gone down on nxge4
Feb 25 12:22:10 clnode1 in.mpathd[198]: [ID 594170 daemon.error] NIC failure detected on nxge4 of group primary
Beware that nxge4 is not part of the interconnect but it shows an error when executing the status command for the interconnect.
In such a case check the /etc/path_to_inst file. Especially when there was some maintenance on the node.
Ensure that the path entries in /etc/path_to_inst file match the configuration in Solaris Cluster. This means the device paths for nxge0 and
nxge1 should link to the necessary device paths of private interconnect. In the example above the nxge4 was moved to the device paths of
private interconnect and nxge1 was moved to the public network in /etc/path_to_inst file. When its necessary to correct the errors in
/etc/path_to_inst file then a reboot is required afterwards.
# clinterconnect status
If the transports now shows the Path online you are done.
4) To see adapter interface detailed configuration switch name use the following command
# scconf -pvv
2 of 3 12-03-2019, 01:10
Document 1315772.1 https://support.oracle.com/epmos/faces/DocumentDisplay?_adf.ctrl-state...
or use the new command "clintr show -v" for configuration details.
You can also look at your interface configuration in Solaris command line:
# ifconfig -a
clprivnet0 is the cluster private logical host failover interface over the physical private transports.
Observe your console and your /var/adm/messages file for possible errors.
If the connection is still faulted there is a potential hardware problem.
If you have a spare cable you can replace the cables that are going to the nxge1 interfaces and repeat the above procedure with
clinterconnect disable/enable. If the nxge1 links shows offline on either node the HBA might need to be replaced.
On the switch if you have extra ports you could try to move the cables (the same ones going to nxge1) to different port. Check with your
network administrator and ask them to check switch logs to see if the switch is OK or if it is logging errors.
Because there are several components in the Transport Path. It is difficult to give more specific instructions.
Pay attention to the interfaces and cables if you end up replacing them. If you accidentally remove the last working transport one of the
nodes will panic.
3 of 3 12-03-2019, 01:10