Beruflich Dokumente
Kultur Dokumente
(https://access.redhat.com/)
Environment
Red Hat Enterprise Linux 8
Red Hat Enterprise Linux 7
Red Hat Enterprise Linux 6
Red Hat Enterprise Linux 5
NFS Client ( nfs-utils package)
Issue
NFS shares hang with the following error(s) in /var/log/messages :
Resolution
The resolution for this issue will vary depending on whether the root cause is:
For non-Red Hat NFS Clients or Servers, engage the vendor of the non-Red Hat system. Investigate
connectivity issues such as network link down, network packet loss, system hang, NFS client/server
hang or slowness, storage hang or slowness.
https://access.redhat.com/solutions/28211 1/11
8/6/2020 RHEL mount hangs: nfs: server [...] not responding, still trying - Red Hat Customer Portal
The team responsible for the network between NFS Client and NFS Server should be engaged to
(https://access.redhat.com/)
investigate connectivity and capacity issues.
Root Cause
Explanation of the Message
If the NFS client does not receive a response from the NFS server, the
" server ... not responding, still trying " message may appear in syslog.
Each message indicates that one NFS/RPC request (for example, one NFS WRITE) has been
sent retrans times and timed out each time. With the default options of retrans and timeo ,
this message will be printed after 180 seconds. For more information, see the retrans and
timeo options in the NFS manual page ('man nfs').
NOTE: A very low value for timeo NFS mount option, which is much less than the default of
600, may increase the likelihood and frequency of this message. For example, setting timeo=5
with the default retrans=2 will cause this message to be printed if the NFS server takes
longer than 0.5 + 1.0 = 1.5 seconds to respond to any NFS request. Under a heavy NFS
workload, it is not unusual for an NFS server to take longer than 1.5 seconds to respond to one
or more NFS requests. For more information on timeo and retrans , see the NFS manual
page ( man nfs ).
A damaged security appliance mangling packets between the NFS Client and NFS Server:
https://access.redhat.com/solutions/1122483 (https://access.redhat.com/solutions/1122483)
The port-channel aka EtherChannel aka bonding configuration on the switch was incorrect:
https://access.redhat.com/solutions/190183 (https://access.redhat.com/solutions/190183)
https://access.redhat.com/solutions/28211 2/11
8/6/2020 RHEL mount hangs: nfs: server [...] not responding, still trying - Red Hat Customer Portal
A second system on the network had duplicated the IP address of the NFS Server
(https://access.redhat.com/)
The switch was dropping TCP SYN,ACK packets: https://access.redhat.com/solutions/1262663
(https://access.redhat.com/solutions/1262663)
Issue was with a Riverbed WAN optimizer device
Cisco ASA between NFS Server and NFS Clients could not handle wrap of TCP Sequence
number:
https://access.redhat.com/solutions/2778561 (https://access.redhat.com/solutions/2778561)
Non-Red Hat NFS Server: A problem with the disk configuration at storage pool level. NFS
Server vendor: "Specifically, we think that the lack of free space in the pool plus the somewhat
random nature of the files to access makes auto-tiering fail on relocation operations."
Non-Red Hat NFS Server: A TCP performance issue when certain conditions were met, fixed
by a specific patch
Non-Red Hat NFS Server: A configuration issue caused data to be sent through the wrong
network interface
Red Hat NFS Server: Thread count may be too low on the NFS server. For more information on
this, see "How do I increase the number of threads created by the NFS daemon in RHEL 4, 5
and 6?" (https://access.redhat.com/site/solutions/2216)
Red Hat NFS Server: Three different bugs, and when all were present, a complete DoS of the
NFS Server occurred: https://access.redhat.com/solutions/544553
(https://access.redhat.com/solutions/544553)
RHEL7 NFS client or server under heavy load with certain NICs and jumbo frames may silently
drop packets due to default / too low min_free_kbytes setting:
https://access.redhat.com/solutions/4085851 (https://access.redhat.com/solutions/4085851)
An incorrect MTU (network) setting on the client causing timeouts (and a watchdog reboot)
Jumbo packets ( MTU=9000 ) selected on one system, but not across the rest of the network
https://access.redhat.com/solutions/28211 3/11
8/6/2020 RHEL mount hangs: nfs: server [...] not responding, still trying - Red Hat Customer Portal
An incorrect/inefficient bonding mode is in use: What is the best bonding mode for TCP traffic
(https://access.redhat.com/)
such as NFS, ISCSI, CIFS, etc? (https://access.redhat.com/solutions/2217521)
The net.ipv4.tcp_frto setting may trigger this issue:
https://access.redhat.com/solutions/1531943 (https://access.redhat.com/solutions/1531943)
An NFS client kernel regression that caused the RPC layer to become non-functional. For
more information, see RHEL6.7.z: NFS client with kernels 2.6.32-573.10.2.el6 or above hangs
with 'not responding, still trying' messages and running processes in _spin_lock
(https://access.redhat.com/solutions/2215491)
Possible regression in RHEL6.9 kernels involving an NFS client's sunrpc TCP port re-use logic
as detailed in https://access.redhat.com/solutions/3018371
(https://access.redhat.com/solutions/3018371)
RHEL7.6: NFSv3 client hangs after 5 minute idle timer drops the TCP connection and a
subsequent TCP 3-way handhake fails due to duplicate SYN or unexpected RST from the NFS
client as described in https://access.redhat.com/solutions/3765711
(https://access.redhat.com/solutions/3765711)
RHEL7 NFS client or server under heavy load with certain NICs and jumbo frames may silently
drop packets due to default / too low min_free_kbytes setting:
https://access.redhat.com/solutions/4085851 (https://access.redhat.com/solutions/4085851)
Diagnostic Steps
Initial steps to rule out common problems
First, identify the timeframe of the problem. The beginning of the incident is the timestamp on
the not responding, still trying message, adjusted backwards for the timeo and
retrans values (see the Root Cause section about timeo and retrans ).
The end of the incident is when you see an nfs server: ... OK message.
If there is no OK message then the problem is ongoing (the NFS Server still has not responded).
If there are multiple not responding messages, there may be multiple timeframes or you may need
to adjust further.
For example:
https://access.redhat.com/solutions/28211 4/11
8/6/2020 RHEL mount hangs: nfs: server [...] not responding, still trying - Red Hat Customer Portal
Since the default mount options are being used, the problem began 180 seconds before the
not responding message. The problem ended when the 'OK' message was seen:
Sep 29 22:51:39 - Problem BEGIN: adjusted start time of the problem based on 'timeo' and
'retrans'
Sep 29 22:54:39 - 'not responding, still trying' seen
Sep 29 22:54:49 - Problem END: 'OK' seen
On the NFS Server, check any logs for signs of performance issues during the timeframe(s)
identified. For non-Red Hat NFS servers, engage your NFS Server vendor and give them the
timeframe of the problem to investigate.
On the NFS Client and NFS Server, check if there are problems with the network interface
and/or network. For example:
Look for dropped packets in ip -s link and/or ethtool output. For one such possibility,
see: System dropping packets due to rx_fw_discards
(https://access.redhat.com/solutions/21301)
The xsos tool can also be used to look for packet loss on network interfaces, see: How can
I install xsos command in Red Hat Enterprise Linux?
(https://access.redhat.com/solutions/511753)
Check the MTU settings on the NFS Client, the NFS Server, and throughout the path from
the NFS Client to NFS Server. All systems must have the same MTU configured.
Look for evidence of packet loss outside the system by running netstat -s on RHEL.
Counters under the TcpExt heading such as retransmits or congestion may indicate
external packet loss. However, note these are system-wide TCP counters which have
incremented since system boot, so errors may be related to other TCP connections and not
the NFS connection.
If bonding is being used, and the NFS transport is TCP, check for an incorrect bonding
mode, as described in What is the best bonding mode for TCP traffic such as NFS, ISCSI,
CIFS, etc? (https://access.redhat.com/solutions/2217521)
https://access.redhat.com/solutions/28211 5/11
8/6/2020 RHEL mount hangs: nfs: server [...] not responding, still trying - Red Hat Customer Portal
Identify any other NFS Client accessing the same NFS Server, especially any identical NFS
(https://access.redhat.com/)
Client (mounting same exports, same mount options, same Red Hat version, etc). Do any
similar NFS Clients experience the not responding messages at the same timeframe? If there
are other NFS Clients, this lends credence to either a NFS Server issue or a
networking/connectivity issue between the NFS Client and NFS Server.
Identify any network equipment such as routers, switches, or firewalls between the NFS Client
and NFS Server. If possible, examine any logs or monitoring statistics (eg: Cacti, rrdtool) from
these devices at the timeframe of the incident.
Once the problem is isolated, further troubleshooting is required to fix the problem, and is beyond
the scope of this solution.
The most direct means of troubleshooting this issue requires at least packet captures from both the
NFS Client and NFS Server perspectives. In some scenarios, it may be possible to diagnose with a
packet capture on just one side, such as the NFS Client, but both sides are highly recommended.
NOTE: Any tcpdump capture should only contain packets involving the problematic NFS server.
If using tcpdump, you can accomplish this by using the 'host' pcap-filter and providing the NFS
server name or IP address from the "not responding" message. Failing to filter the packet capture to
only the problematic NFS server is very likely to result in delays in root cause analysis. Example:
Gathering packet captures with tcpdump (Red Hat NFS Client or Server)
For generic steps on gathering a packet capture on any Red Hat NFS Server or NFS Client, see How
to capture network packets with tcpdump? (https://access.redhat.com/solutions/8787)
For a simple way to gather a packet capture using the tcpdump tool on a RHEL NFS Client, use the
tcpdump-watch.sh script on the following solution:
https://access.redhat.com/articles/4330981#intermittent
(https://access.redhat.com/articles/4330981#intermittent).
https://access.redhat.com/solutions/28211 6/11
8/6/2020 RHEL mount hangs: nfs: server [...] not responding, still trying - Red Hat Customer Portal
The script takes a single parameter, the NFS Server name or IP address, and watches
(https://access.redhat.com/)
/var/log/messages for the nfs: server ... not responding, still trying messages. When it
sees the message, the tcpdump is stopped.
Please note: the default tcpdump arguments in the tcpdump-watch.sh script may work for many
environments, but some environments may need slight changes. For example, if there are large NFS
READs and WRITEs, in the initial packet capture and/or there are a lot of packets dropped by the
tcpdump process, then reduce the size of the packet captured to ~512 bytes with the "snaplen"
parameter ( -s 512 ). In addition, if the packet capture collects more than NFS traffic between the
NFS Client and NFS Server, you may need to add one or more pcap-filters such as port 2049 to
capture only traffic to/from the NFS port. For more information on pcap-filters , see the manual
page man pcap-filter .
For NetApp filers, contact NetApp for official recommendations for your filer and environment.
You may be able to use the pktt command as described by How do I capture a packet trace
of NFS operations on a NetApp filer? (https://access.redhat.com/solutions/425893).
For EMC Isilon filers, please contact EMC for official recommendations for your filer and
environment. You may be able to use the isi_netlogger command or the web interface as
described by How do I capture a packet trace of NFS operations on a EMC Isilon filer?
(https://access.redhat.com/solutions/2070373).
For a few examples of common scenarios which may be seen in a tcpdump gathered on the NFS
Client, please see NFS client tcpdump analysis: 3 common failure scenarios
(https://access.redhat.com/articles/1342293)
https://access.redhat.com/solutions/28211 7/11
8/6/2020 RHEL mount hangs: nfs: server [...] not responding, still trying - Red Hat Customer Portal
Red Hat may request a vmcore from an NFS Client or NFS Server at a later date if it is believed
there is a specific bug within RHEL, but a vmcore is not an initial or common troubleshooting step
for this sort of issue.
Tags netapp (/tags/netapp) network (/tags/network) nfs (/tags/nfs) nfs3 (/tags/nfs3) nfs4 (/tags/nfs4)
rhel_4 (/tags/rhel_4) rhel_5 (/tags/rhel_5) rhel_6 (/tags/rhel_6) rhel_7 (/taxonomy/tags/rhel7)
troubleshooting (/tags/troubleshooting)
This solution is part of Red Hat’s fast-track publication program, providing a huge library of solutions
that Red Hat engineers have created while supporting our customers. To give you the knowledge
you need the instant it becomes available, these articles may be presented in a raw and unedited
form.
6 Comments
https://access.redhat.com/solutions/28211 8/11
8/6/2020 RHEL mount hangs: nfs: server [...] not responding, still trying - Red Hat Customer Portal
497
(https://access.redhat.com/)
Points
PRO
841 Points
5 Points
5 Points
https://access.redhat.com/solutions/28211 9/11
8/6/2020 RHEL mount hangs: nfs: server [...] not responding, still trying - Red Hat Customer Portal
Oct 9 23:29:36 hostname kernel: nfs: server 10.xx.xx.xx not responding, still trying Oct 9
(https://access.redhat.com/)
23:29:36 hostname kernel: nfs: server 10.xx.xx.xx not responding, still trying Oct 9
23:29:36 hostname kernel: nfs: server 10.xx.xx.xx not responding, still trying Oct 9
23:29:36 hostname kernel: nfs: server 10.xx.xx.xx not responding, still trying
Oct 9 23:30:59 hostname kernel: nfs: server 10.xx.xx.xx OK Oct 9 23:30:59 hostname
kernel: nfs: server 10.xx.xx.xx OK Oct 9 23:30:59 hostname kernel: nfs: server 10.xx.xx.xx
OK Oct 9 23:30:59 hostname kernel: nfs: server 10.xx.xx.xx OK Oct 9 23:30:59
hostname kernel: nfs: server 10.xx.xx.xx OK Oct 9 23:30:59 hostname kernel: nfs: server
10.xx.xx.xx OK Oct 11 22:48:46 hostname kernel: [] ?
nfs_access_cache_shrinker+0x1cc/0x230 [nfs] Oct 11 22:48:46 hostname kernel: [] ?
nfs_access_cache_shrinker+0x1cc/0x230 [nfs] Oct 11 22:48:46 hostname kernel: [] ?
nfs_access_cache_shrinker+0x1cc/0x230 [nfs] Oct 11 22:48:46 hostname kernel: [] ?
nfs_access_cache_shrinker+0x1cc/0x230 [nfs] Oct 12 06:56:00 hostname kernel: []
nfs_file_write+0xbb/0x1d0 [nfs] Oct 12 06:56:00 hostname kernel: []
nfs_file_write+0xbb/0x1d0 [nfs] Oct 12 06:56:00 hostname kernel: []
nfs_file_write+0xbb/0x1d0 [nfs] Oct 12 06:56:00 hostname kernel: []
nfs_file_write+0xbb/0x1d0 [nfs] Oct 12 06:56:00 hostname kernel: []
nfs_file_write+0xbb/0x1d0 [nfs] Oct 12 08:05:40 hostname kernel: [] ?
nfs_access_cache_shrinker+0x203/0x230 [nfs] Oct 12 08:05:40 hostname kernel: [] ?
nfs_access_cache_shrinker+0x1cc/0x230 [nfs] Oct 12 08:05:40 hostname kernel: [] ?
nfs_access_cache_shrinker+0x203/0x230 [nfs] Oct 12 08:05:40 hostname kernel: [] ?
nfs_access_cache_shrinker+0x1cc/0x230 [nfs] Oct 12 08:05:40 hostname kernel: [] ?
nfs_access_cache_shrinker+0x203/0x230 [nfs] Oct 12 08:05:40 hostname kernel: [] ?
nfs_access_cache_shrinker+0x1cc/0x230 [nfs] Oct 12 08:05:40 hostname kernel: [] ?
nfs_access_cache_shrinker+0x203/0x230 [nfs] Oct 12 08:05:40 hostname kernel: [] ?
nfs_access_cache_shrinker+0x1cc/0x230 [nfs] Oct 12 21:16:40 hostname kernel: NFS:
nfs_update_inode(0:41/9268562720670613568 fh_crc=0x930fae80 ct=3
info=0x27e7f) Oct 12 21:16:40 hostname kernel: NFS: nfs_weak_revalidate: inode
9268562720670613568 is valid Oct 12 21:16:40 hostname kernel: NFS:
nfs_update_inode(0:38/9268562673425973312 fh_crc=0xd4faca21 ct=3
info=0x27e7f) Oct 12 21:16:40 hostname kernel: NFS: nfs_weak_revalidate: inode
9268562673425973312 is valid Oct 12 21:16:40 hostname kernel: NFS:
nfs_update_inode(0:37/9268562677720940608 fh_crc=0x15db0f21 ct=3
info=0x27e7f) Oct 12 21:16:40 hostname kernel: NFS: nfs_weak_revalidate: inode
9268562677720940608 is valid Oct 12 21:16:40 hostname kernel: NFS:
nfs_weak_revalidate: inode 9268562720670613568 is valid Oct 12 21:16:40 hostname
kernel: NFS: nfs_weak_revalidate: inode 9268562673425973312 is valid Oct 12 21:16:40
hostname kernel: NFS: nfs_weak_revalidate: inode 9268562677720940608 is valid
https://access.redhat.com/solutions/28211 10/11
8/6/2020 RHEL mount hangs: nfs: server [...] not responding, still trying - Red Hat Customer Portal
(https://access.redhat.com/)
https://access.redhat.com/solutions/28211 11/11