RHEL Mount Hangs - Nfs - Server (... ) Not Responding, Still Trying - Red Hat Customer Portal

8/6/2020 RHEL mount hangs: nfs: server [...
] not responding, still trying - Red Hat Customer Portal
(https://access.redhat.com/)

RHEL mount hangs: nfs: server [...] not responding,

still trying
$ SOLUTION VERIFIED - Updated July 24 2020 at 7:11 AM - English ()
Environment
Red Hat Enterprise Linux 8
NFS Client ( nfs-utils package)
Issue
NFS shares hang with the following error(s) in /var/log/messages :
kernel: nfs: server <servername> not responding, still trying
kernel: nfs: server <servername> not responding, timed out
Resolution
The resolution for this issue will vary depending on whether the root cause is:
Problem between the NFS Client and Server

Problem on the NFS Server
Problem on the NFS Client
Investigation will be required on both NFS Client and NFS Server.
For non-Red Hat NFS Clients or Servers, engage the vendor of the non-Red Hat system. Investigate
connectivity issues such as network link down, network packet loss, system hang, NFS client/server
hang or slowness, storage hang or slowness.
https://access.redhat.com/solutions/28211 1/11
8/6/2020 RHEL mount hangs: nfs: server [...] not responding, still trying - Red Hat Customer Portal
The team responsible for the network between NFS Client and NFS Server should be engaged to
investigate connectivity and capacity issues. 
Root Cause
Explanation of the Message
If the NFS client does not receive a response from the NFS server, the
" server ... not responding, still trying " message may appear in syslog.
Each message indicates that one NFS/RPC request (for example, one NFS WRITE) has been
sent retrans times and timed out each time. With the default options of retrans and timeo ,
this message will be printed after 180 seconds. For more information, see the retrans and
timeo options in the NFS manual page ('man nfs').
NOTE: A very low value for timeo NFS mount option, which is much less than the default of
600, may increase the likelihood and frequency of this message. For example, setting timeo=5
with the default retrans=2 will cause this message to be printed if the NFS server takes
longer than 0.5 + 1.0 = 1.5 seconds to respond to any NFS request. Under a heavy NFS
workload, it is not unusual for an NFS server to take longer than 1.5 seconds to respond to one
or more NFS requests. For more information on timeo and retrans , see the NFS manual
page ( man nfs ).
Categories of Root Causes

There are 3 possible categories of root causes:
Problem between the NFS Client and Server

Within each category, there are specific instances given below.
Problem between the NFS Client and NFS Server

For example, overloaded, mis-configured, or malfunctioning switches, firewalls, or networks may
cause NFS requests to get dropped or mangled between the NFS Client and NFS Server.
Some specific instances have been:
A damaged security appliance mangling packets between the NFS Client and NFS Server:
https://access.redhat.com/solutions/1122483 (https://access.redhat.com/solutions/1122483)
The port-channel aka EtherChannel aka bonding configuration on the switch was incorrect:
A second system on the network had duplicated the IP address of the NFS Server

The switch was dropping TCP SYN,ACK packets: https://access.redhat.com/solutions/1262663
(https://access.redhat.com/solutions/1262663)
Issue was with a Riverbed WAN optimizer device
Cisco ASA between NFS Server and NFS Clients could not handle wrap of TCP Sequence
number:
A problem on the NFS Server

For example, the NFS server is overloaded or contains a hardware or software bug which causes it to
drop NFS requests.
Non-Red Hat NFS Server: A problem with the disk configuration at storage pool level. NFS
Server vendor: "Specifically, we think that the lack of free space in the pool plus the somewhat
random nature of the files to access makes auto-tiering fail on relocation operations."
Non-Red Hat NFS Server: A TCP performance issue when certain conditions were met, fixed
by a specific patch
Non-Red Hat NFS Server: A configuration issue caused data to be sent through the wrong
network interface
Red Hat NFS Server: Thread count may be too low on the NFS server. For more information on
this, see "How do I increase the number of threads created by the NFS daemon in RHEL 4, 5
and 6?" (https://access.redhat.com/site/solutions/2216)
Red Hat NFS Server: Three different bugs, and when all were present, a complete DoS of the
NFS Server occurred: https://access.redhat.com/solutions/544553
RHEL7 NFS client or server under heavy load with certain NICs and jumbo frames may silently
drop packets due to default / too low min_free_kbytes setting:
A problem on the NFS Client

For example, the NFS Client networking misconfiguration, NIC driver or firmware bug causing NFS
requests to be dropped, NFS Client firewall not allowing NFS traffic in our out.
An incorrect MTU (network) setting on the client causing timeouts (and a watchdog reboot)
Jumbo packets ( MTU=9000 ) selected on one system, but not across the rest of the network
An incorrect/inefficient bonding mode is in use: What is the best bonding mode for TCP traffic
such as NFS, ISCSI, CIFS, etc? (https://access.redhat.com/solutions/2217521) 
The net.ipv4.tcp_frto setting may trigger this issue:
An NFS client kernel regression that caused the RPC layer to become non-functional. For
more information, see RHEL6.7.z: NFS client with kernels 2.6.32-573.10.2.el6 or above hangs
with 'not responding, still trying' messages and running processes in _spin_lock
Possible regression in RHEL6.9 kernels involving an NFS client's sunrpc TCP port re-use logic
as detailed in https://access.redhat.com/solutions/3018371
RHEL7.6: NFSv3 client hangs after 5 minute idle timer drops the TCP connection and a
subsequent TCP 3-way handhake fails due to duplicate SYN or unexpected RST from the NFS
client as described in https://access.redhat.com/solutions/3765711
RHEL7 NFS client or server under heavy load with certain NICs and jumbo frames may silently
drop packets due to default / too low min_free_kbytes setting:
Diagnostic Steps
Initial steps to rule out common problems
First, identify the timeframe of the problem. The beginning of the incident is the timestamp on
the not responding, still trying message, adjusted backwards for the timeo and
retrans values (see the Root Cause section about timeo and retrans ).
The end of the incident is when you see an nfs server: ... OK message.
If there is no OK message then the problem is ongoing (the NFS Server still has not responded).
If there are multiple not responding messages, there may be multiple timeframes or you may need
to adjust further.
For example:
# grep server.example.com /proc/mounts

server.example.com:/export /mnt nfs4

rw,relatime,vers=4,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp,port=0,timeo=600,retran
s=2,sec=sys,clientaddr=x.x.x.x,minorversion=0,local_lock=none,addr=y.y.y.y 0 0
# grep "server.example.com" /var/log/messages
Sep 29 22:54:39 client kernel: nfs: server server.example.com not responding, still trying
Sep 29 22:54:49 client kernel: nfs: server server.example.com OK
Since the default mount options are being used, the problem began 180 seconds before the
not responding message. The problem ended when the 'OK' message was seen:
Sep 29 22:51:39 - Problem BEGIN: adjusted start time of the problem based on 'timeo' and
'retrans'
Sep 29 22:54:39 - 'not responding, still trying' seen
Sep 29 22:54:49 - Problem END: 'OK' seen
The timeframe of the problem has now been determined.
On the NFS Server, check any logs for signs of performance issues during the timeframe(s)
identified. For non-Red Hat NFS servers, engage your NFS Server vendor and give them the
timeframe of the problem to investigate.
On the NFS Client and NFS Server, check if there are problems with the network interface
and/or network. For example:
Look for dropped packets in ip -s link and/or ethtool output. For one such possibility,
see: System dropping packets due to rx_fw_discards
The xsos tool can also be used to look for packet loss on network interfaces, see: How can
I install xsos command in Red Hat Enterprise Linux?
Check the MTU settings on the NFS Client, the NFS Server, and throughout the path from
the NFS Client to NFS Server. All systems must have the same MTU configured.
Look for evidence of packet loss outside the system by running netstat -s on RHEL.
Counters under the TcpExt heading such as retransmits or congestion may indicate
external packet loss. However, note these are system-wide TCP counters which have
incremented since system boot, so errors may be related to other TCP connections and not
the NFS connection.
If bonding is being used, and the NFS transport is TCP, check for an incorrect bonding
mode, as described in What is the best bonding mode for TCP traffic such as NFS, ISCSI,
CIFS, etc? (https://access.redhat.com/solutions/2217521)
Identify any other NFS Client accessing the same NFS Server, especially any identical NFS
Client (mounting same exports, same mount options, same Red Hat version, etc). Do any 
similar NFS Clients experience the not responding messages at the same timeframe? If there
are other NFS Clients, this lends credence to either a NFS Server issue or a
networking/connectivity issue between the NFS Client and NFS Server.
Identify any network equipment such as routers, switches, or firewalls between the NFS Client
and NFS Server. If possible, examine any logs or monitoring statistics (eg: Cacti, rrdtool) from
these devices at the timeframe of the incident.
Troubleshooting with packet captures

The goal of these steps is to isolate the problem into one of 3 categories:
Problem between the NFS Client and NFS Server

Once the problem is isolated, further troubleshooting is required to fix the problem, and is beyond
the scope of this solution.
The most direct means of troubleshooting this issue requires at least packet captures from both the
NFS Client and NFS Server perspectives. In some scenarios, it may be possible to diagnose with a
packet capture on just one side, such as the NFS Client, but both sides are highly recommended.
NOTE: Any tcpdump capture should only contain packets involving the problematic NFS server.
If using tcpdump, you can accomplish this by using the 'host' pcap-filter and providing the NFS
server name or IP address from the "not responding" message. Failing to filter the packet capture to
only the problematic NFS server is very likely to result in delays in root cause analysis. Example:
# grep "not responding" /var/log/messages

Sep 29 22:54:39 client kernel: nfs: server server.example.com not responding, still trying
# tcpdump -i eth0 -s 0 -w /tmp/tcpdump.pcap host server.example.com
Gathering packet captures with tcpdump (Red Hat NFS Client or Server)
For generic steps on gathering a packet capture on any Red Hat NFS Server or NFS Client, see How
to capture network packets with tcpdump? (https://access.redhat.com/solutions/8787)
For a simple way to gather a packet capture using the tcpdump tool on a RHEL NFS Client, use the
tcpdump-watch.sh script on the following solution:
https://access.redhat.com/articles/4330981#intermittent
(https://access.redhat.com/articles/4330981#intermittent).
The script takes a single parameter, the NFS Server name or IP address, and watches

/var/log/messages for the nfs: server ... not responding, still trying messages. When it
sees the message, the tcpdump is stopped.
Please note: the default tcpdump arguments in the tcpdump-watch.sh script may work for many
environments, but some environments may need slight changes. For example, if there are large NFS
READs and WRITEs, in the initial packet capture and/or there are a lot of packets dropped by the
tcpdump process, then reduce the size of the packet captured to ~512 bytes with the "snaplen"
parameter ( -s 512 ). In addition, if the packet capture collects more than NFS traffic between the
NFS Client and NFS Server, you may need to add one or more pcap-filters such as port 2049 to
capture only traffic to/from the NFS port. For more information on pcap-filters , see the manual
page man pcap-filter .
Gathering packet captures on an NFS Server (non-Red Hat NFS Server)

Contact your NFS Server vendor for official steps on gathering a packet capture from the NFS
Server, or use a port mirror to capture traffic from the NFS Server perspective.
For NetApp filers, contact NetApp for official recommendations for your filer and environment.
You may be able to use the pktt command as described by How do I capture a packet trace
of NFS operations on a NetApp filer? (https://access.redhat.com/solutions/425893).
For EMC Isilon filers, please contact EMC for official recommendations for your filer and
environment. You may be able to use the isi_netlogger command or the web interface as
described by How do I capture a packet trace of NFS operations on a EMC Isilon filer?
(https://access.redhat.com/solutions/2070373).
Analysis of packet captures

Take the timeframe of the problem calculated in the initial steps and use Wireshark or tshark to
inspect the packet capture files. Be sure to use the correct TZ shell variable when running Wireshark
or tshark so the timestamps on the packets will line up with the timeframe of the problem. For
more information on the TZ variable, see the section titled "Timestamps in packet traces and
matching other event timestamps" in NFS packet trace analysis tips and tricks
(https://access.redhat.com/articles/266813). Examine the packet captures for signs of network
problems, such as retransmits/duplicates, TCP/IP handshake problems, delays in NFS RPC replies,
etc.
For a few examples of common scenarios which may be seen in a tcpdump gathered on the NFS
Client, please see NFS client tcpdump analysis: 3 common failure scenarios
(https://access.redhat.com/articles/1342293)
Troubleshooting with vmcores


In general, a vmcore (copy of kernel memory created by causing a kernel panic) is not required to
investigate a connectivity issue such as this.
Red Hat may request a vmcore from an NFS Client or NFS Server at a later date if it is believed
there is a specific bug within RHEL, but a vmcore is not an initial or common troubleshooting step
for this sort of issue.
Product(s) Red Hat Enterprise Linux (/taxonomy/products/red-hat-enterprise-linux)
Component kernel (/components/kernel) nfs-utils (/components/nfs-utils)
Category Troubleshoot (/category/troubleshoot)
Tags netapp (/tags/netapp) network (/tags/network) nfs (/tags/nfs) nfs3 (/tags/nfs3) nfs4 (/tags/nfs4)
rhel_4 (/tags/rhel_4) rhel_5 (/tags/rhel_5) rhel_6 (/tags/rhel_6) rhel_7 (/taxonomy/tags/rhel7)
troubleshooting (/tags/troubleshooting)
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions
that Red Hat engineers have created while supporting our customers. To give you the knowledge
you need the instant it becomes available, these articles may be presented in a raw and unedited
form.
6 Comments
 LOG IN TO COMMENT (HTTPS://ACCESS.REDHAT.COM/LOGIN?

REDIRECTTO=HTTPS://ACCESS.REDHAT.COM/SOLUTIONS/28211)
22 August 2012 2:01 AM (https://access.redhat.com/solutions/28211#comment-395233)

HJ Harshula Jayasuriya (/user/4983)
(/user/4983)* Same symptom, different resolution:
RED HAT
https://access.redhat.com/knowledge/solutions/190183
PRO
497
Points 
1 April 2014 5:57 PM (https://access.redhat.com/solutions/28211#comment-733353)

NN Neil Nguyen (/user/1531323)
(/user/1531323)
The fix on our system was to lazy unmount, stop any processes that might be using the
NEWBIE mount point, and remount.
7 Points
umount -l (mountpoint)
service yum-updatesd stop
mount (flags) (mountpoint)
21 October 2017 3:19 PM (https://access.redhat.com/solutions/28211#comment-1233541)
Diego Roberto Dos Santos (/user/12916011)

(/user/12916011)
this info is very useful.
EXPERT
PRO
841 Points
14 April 2019 11:21 AM (https://access.redhat.com/solutions/28211#comment-1498561)

JF Jean Fomenko (/user/19030581)
(/user/19030581)
same problem fixed by disabling DoS Defend protection on switches
NEWBIE
5 Points
16 May 2019 11:05 PM (https://access.redhat.com/solutions/28211#comment-1524771)

JD Jimmy César Díaz del Aguila (/user/29377191)
(/user/29377191)
Agradeciendo de realizar descargas y pruebas de un excelente producto empresarial
NEWBIE
5 Points
12 October 2019 10:54 AM (https://access.redhat.com/solutions/28211#comment-1642141)

UT Unix Team (/user/7722263)
(/user/7722263)
We are facing the same issue, When we copy some data to NFS mount on client,
COMMUNITY systems gets hung. Didnt find any packets loss, and after copy done system gets back
MEMBER
to normal.
28 Points
Oct 9 23:29:36 hostname kernel: nfs: server 10.xx.xx.xx not responding, still trying Oct 9
23:29:36 hostname kernel: nfs: server 10.xx.xx.xx not responding, still trying Oct 9 
23:29:36 hostname kernel: nfs: server 10.xx.xx.xx not responding, still trying Oct 9
23:29:36 hostname kernel: nfs: server 10.xx.xx.xx not responding, still trying
Oct 9 23:30:59 hostname kernel: nfs: server 10.xx.xx.xx OK Oct 9 23:30:59 hostname
kernel: nfs: server 10.xx.xx.xx OK Oct 9 23:30:59 hostname kernel: nfs: server 10.xx.xx.xx
OK Oct 9 23:30:59 hostname kernel: nfs: server 10.xx.xx.xx OK Oct 9 23:30:59
hostname kernel: nfs: server 10.xx.xx.xx OK Oct 9 23:30:59 hostname kernel: nfs: server
10.xx.xx.xx OK Oct 11 22:48:46 hostname kernel: [] ?
nfs_access_cache_shrinker+0x1cc/0x230 [nfs] Oct 11 22:48:46 hostname kernel: [] ?
nfs_access_cache_shrinker+0x1cc/0x230 [nfs] Oct 12 06:56:00 hostname kernel: []
nfs_file_write+0xbb/0x1d0 [nfs] Oct 12 06:56:00 hostname kernel: []
nfs_file_write+0xbb/0x1d0 [nfs] Oct 12 08:05:40 hostname kernel: [] ?
nfs_access_cache_shrinker+0x203/0x230 [nfs] Oct 12 08:05:40 hostname kernel: [] ?
nfs_access_cache_shrinker+0x1cc/0x230 [nfs] Oct 12 21:16:40 hostname kernel: NFS:
nfs_update_inode(0:41/9268562720670613568 fh_crc=0x930fae80 ct=3
info=0x27e7f) Oct 12 21:16:40 hostname kernel: NFS: nfs_weak_revalidate: inode
9268562720670613568 is valid Oct 12 21:16:40 hostname kernel: NFS:
nfs_update_inode(0:38/9268562673425973312 fh_crc=0xd4faca21 ct=3
nfs_update_inode(0:37/9268562677720940608 fh_crc=0x15db0f21 ct=3
nfs_weak_revalidate: inode 9268562720670613568 is valid Oct 12 21:16:40 hostname
kernel: NFS: nfs_weak_revalidate: inode 9268562673425973312 is valid Oct 12 21:16:40
hostname kernel: NFS: nfs_weak_revalidate: inode 9268562677720940608 is valid

All systems operational (https://status.redhat.com)
Privacy Statement (http://www.redhat.com/en/about/privacy-

policy)
Customer Portal Terms of Use
(https://access.redhat.com/help/terms/)
All Policies and Guidelines
(http://www.redhat.com/en/about/all-policies-guidelines)
Copyright © 2020 Red Hat, Inc.

RHEL Mount Hangs - Nfs - Server (... ) Not Responding, Still Trying - Red Hat Customer Portal

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

RHEL Mount Hangs - Nfs - Server (... ) Not Responding, Still Trying - Red Hat Customer Portal

Hochgeladen von

Copyright:

Verfügbare Formate

8/6/2020 RHEL mount hangs: nfs: server [...

] not responding, still trying - Red Hat Customer Portal

RHEL mount hangs: nfs: server [...] not responding,

kernel: nfs: server <servername> not responding, still trying

kernel: nfs: server <servername> not responding, timed out

Problem between the NFS Client and Server

Investigation will be required on both NFS Client and NFS Server.

Categories of Root Causes

Problem between the NFS Client and Server

Within each category, there are specific instances given below.

Problem between the NFS Client and NFS Server

Some specific instances have been:

A problem on the NFS Server

Some specific instances have been:

A problem on the NFS Client

Some specific instances have been:

# grep server.example.com /proc/mounts

The timeframe of the problem has now been determined.

Troubleshooting with packet captures

Problem between the NFS Client and NFS Server

# grep "not responding" /var/log/messages

Gathering packet captures on an NFS Server (non-Red Hat NFS Server)

Analysis of packet captures

Troubleshooting with vmcores

Product(s) Red Hat Enterprise Linux (/taxonomy/products/red-hat-enterprise-linux)

Component kernel (/components/kernel) nfs-utils (/components/nfs-utils)

Category Troubleshoot (/category/troubleshoot)

 LOG IN TO COMMENT (HTTPS://ACCESS.REDHAT.COM/LOGIN?

22 August 2012 2:01 AM (https://access.redhat.com/solutions/28211#comment-395233)

1 April 2014 5:57 PM (https://access.redhat.com/solutions/28211#comment-733353)

21 October 2017 3:19 PM (https://access.redhat.com/solutions/28211#comment-1233541)

Diego Roberto Dos Santos (/user/12916011)

14 April 2019 11:21 AM (https://access.redhat.com/solutions/28211#comment-1498561)

16 May 2019 11:05 PM (https://access.redhat.com/solutions/28211#comment-1524771)

12 October 2019 10:54 AM (https://access.redhat.com/solutions/28211#comment-1642141)

All systems operational (https://status.redhat.com)

Privacy Statement (http://www.redhat.com/en/about/privacy-

Das könnte Ihnen auch gefallen