Using rpcdebug to debug Linux NFS client and server issues

Debugging NFS issues can sometimes be a chore, especially when you are dealing with busy NFS servers. Tools like nfswatch and nfsstat can help you understand what operations your server is servicing, but sometimes you need to get into the protocol guts to find out more info. There are a couple of ways you can do this. First you can capture NFS traffic with tcpdump and use the NFS protocol decoding features that are built into wireshark. This will allow you to see the NFS traffic that is going over the network, which in most cases is enough to solve a given caper.

Alternatively, you can use the rpcdebug utility to log additional debugging data on a Linux NFS client or server. The debugging data includes information on the RPC module, the lock manager module and the NFS client and server modules. The debugging flags you can set inside these modules can be displayed with the rcpdebug “-vh” option:

$ rpcdebug -vh

usage: rpcdebug [-v] [-h] [-m module] [-s flags...|-c flags...]
       set or cancel debug flags.

Module     Valid flags
rpc        xprt call debug nfs auth bind sched trans svcsock svcdsp misc cache all
nfs        vfs dircache lookupcache pagecache proc xdr file root callback client mount all
nfsd       sock fh export svc proc fileop auth repcache xdr lockd all
nlm        svc client clntlock svclock monitor clntsubs svcsubs hostcache xdr all

To enable a specific set of flags you can pass the name of a module and the flags to enable to rpcdebug:

$ rpcdebug -m nfsd -s proc
nfsd proc

In the example above I told the nfsd module to log debugging data about the RPC procedures that were received. If I hop on an NFS client and write 512-bytes to a file I would see the NFS procedures that were issued in the server’s messages file:

Nov  1 09:53:18 zippy kernel: nfsd: LOOKUP(3)   36: 01070001 0003fe95 00000000 ec426dec 9f402ff3 d2aac8b3 foo1
Nov  1 09:53:18 zippy kernel: nfsd: CREATE(3)   36: 01070001 0003fe95 00000000 ec426dec 9f402ff3 d2aac8b3 foo1
Nov  1 09:53:18 zippy kernel: nfsd: WRITE(3)    36: 01070001 0003fe95 00000000 ec426dec 9f402ff3 d2aac8b3 512 bytes at 0
Nov  1 09:53:18 zippy kernel: nfsd: COMMIT(3)   36: 01070001 0003fe95 00000000 ec426dec 9f402ff3 d2aac8b3 0@0

To log ALL of the nfsd debugging data possible, you can set flags to all. This will enable all of the flags in the module which will cause an excessive amount of data to be logged (don’t do this on production servers unless you have a specific need to do so!). Here is the output you would see from a single 512-byte write when all of the flags are enabled:

Nov  1 09:56:18 zippy kernel: nfsd_dispatch: vers 3 proc 4
Nov  1 09:56:18 zippy kernel: nfsd: ACCESS(3)   36: 01070001 0003fe95 00000000 ec426dec 9f402ff3 d2aac8b3 0x1f
Nov  1 09:56:18 zippy kernel: nfsd: fh_verify(36: 01070001 0003fe95 00000000 ec426dec 9f402ff3 d2aac8b3)
Nov  1 09:56:18 zippy kernel: nfsd_dispatch: vers 3 proc 1
Nov  1 09:56:18 zippy kernel: nfsd: GETATTR(3)  36: 01070001 0003fe95 00000000 ec426dec 9f402ff3 d2aac8b3
Nov  1 09:56:18 zippy kernel: nfsd: fh_verify(36: 01070001 0003fe95 00000000 ec426dec 9f402ff3 d2aac8b3)
Nov  1 09:56:18 zippy kernel: nfsd_dispatch: vers 3 proc 4
Nov  1 09:56:18 zippy kernel: nfsd: ACCESS(3)   36: 01070001 0003fe95 00000000 ec426dec 9f402ff3 d2aac8b3 0x2d
Nov  1 09:56:18 zippy kernel: nfsd: fh_verify(36: 01070001 0003fe95 00000000 ec426dec 9f402ff3 d2aac8b3)
Nov  1 09:56:18 zippy kernel: nfsd_dispatch: vers 3 proc 2
Nov  1 09:56:18 zippy kernel: nfsd: SETATTR(3)  36: 01070001 0003fe95 00000000 ec426dec 9f402ff3 d2aac8b3
Nov  1 09:56:18 zippy kernel: nfsd: fh_verify(36: 01070001 0003fe95 00000000 ec426dec 9f402ff3 d2aac8b3)
Nov  1 09:56:18 zippy kernel: nfsd_dispatch: vers 3 proc 7
Nov  1 09:56:18 zippy kernel: nfsd: WRITE(3)    36: 01070001 0003fe95 00000000 ec426dec 9f402ff3 d2aac8b3 512 bytes at 0
Nov  1 09:56:18 zippy kernel: nfsd: fh_verify(36: 01070001 0003fe95 00000000 ec426dec 9f402ff3 d2aac8b3)
Nov  1 09:56:18 zippy kernel: nfsd: write complete host_err=512
Nov  1 09:56:18 zippy kernel: nfsd_dispatch: vers 3 proc 21
Nov  1 09:56:18 zippy kernel: nfsd: COMMIT(3)   36: 01070001 0003fe95 00000000 ec426dec 9f402ff3 d2aac8b3 0@0
Nov  1 09:56:18 zippy kernel: nfsd: fh_verify(36: 01070001 0003fe95 00000000 ec426dec 9f402ff3 d2aac8b3)

Now you may be asking yourself what is that stuff following the procedure name? The data logged along with the procedure name contains the contents of the data structures sent as part of a given procedure. You can view the data structures and their members by opening up the correct NFS RFC and checking the argument list. Here is what the NFSv3 RFC says about the argument passed to the WRITE procedure:

struct WRITE3args {
         nfs_fh3     file;
         offset3     offset;
         count3      count;
         stable_how  stable;
         opaque      data<>;
};

The structure contains the file handle to perform the operation on, the offset to perform the write and the number of bytes to write. It also contains a stable flag to tell the server it should COMMIT the data prior to returning a success code to the client. The last field contains the data itself.

Once you have a better understanding of the fields passed to a procedure you can cross reference the lines in the messages file with the dprintk() statements in the kernel. Here is the kernel source for the dprintk() statement in the VFS WRITE code:

dprintk("nfsd: WRITE    %s %d bytes at %d\n",
         SVCFH_fmt(&argp->fh),
         argp->len, argp->offset);

This matches up exactly with the contents in the log file, and the dprintk() code shows the order everything is logged. This is useful stuff, though it’s probably not something you will need to access daily. ;) I’m documenting it here so I can recall this procedure down the road.

Using wireshark’s protocol decoding to debug NFS problems

Most admins have probobably encountered a situation where someone says “hey this really bizarre thing is occurring.” Whenever I am approached to look at these types of issues I will typically start by jumping on my systems and reviewing system, network and performance data. Once I’ve verified those are within normal levels I will begin reviewing the client server communications to see what is going on.

I’ve encountered some NFS issues that fit into the “bizarre” category, and it’s amazing how much information you can glean by reviewing the NFS traffic between the client and server. I like to visualize problems, so I will frequently grab the network traffic with tcpdump:

$ tcpdump -w /var/tmp/nfs.dmp -i eth0 host 192.168.56.102 and not port 22

Re-create the issue and then feed the resulting dump into wireshark:

$ wireshark nfs.dmp

Wireshark will decode all of the NFS procedures for you, which makes comparing the captured packet against the pertinent RFC super super easy. The following picture shows how wireshark decodes an NFS WRITE request from a client:

null

And the response sent back from the server:

null

I’m not sure how NFS debugging can get any easier, and the capabilities built into wireshark are absolutely amazing! I reach for it weekly, and if you need to debug NFS protocol issues wireshark will soon become your best friend!

Getting the number of bytes read and written by your Linux NFS kernel threads (nfsd)

Linux NFS server implementations export a number of statistics through the /proc file system. The nfsstat utility can parse this file and display various performance counters, and the data that is displayed comes from the /proc/net/rpc/nfsd proc entry:

$ cat /proc/net/rpc/nfsd

rc 0 2585 290
fh 0 0 0 0 0
io 1069882 10485760
th 8 0 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
ra 32 16 0 0 0 0 0 0 0 0 0 1
net 2880 0 2865 14
rpc 2875 0 0 0 0
proc2 18 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
proc3 22 6 50 1 16 37 0 0 2560 13 1 0 0 10 0 0 0 0 16 3 6 3 10
proc4 2 3 124
proc4ops 59 0 0 0 12 2 0 0 0 0 88 17 0 0 0 0 12 0 0 4 0 2 0 104 0 7 17 1 0 0 0 11 2 4 0 0 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

So what do these values mean? To figure this out we can surf over to fs/nfsd/stats.c in the kernel source:

* Format:
 *      rc   
 *                      Statistsics for the reply cache
 *      fh     
 *                      statistics for filehandle lookup
 *      io  
 *                      statistics for IO throughput
 *      th   <10%-20%> <20%-30%> ... <90%-100%> <100%> 
 *                      time (seconds) when nfsd thread usage above thresholds
 *                      and number of times that all threads were in use
 *      ra cache-size  <10%  <20%  <30% ... <100% not-found
 *                      number of times that read-ahead entry was found that deep in
 *                      the cache.
 *      plus generic RPC stats (see net/sunrpc/stats.c)

and then read through the nfsstat manual pages. While debugging an NFS issue a few weeks back I noticed that nfsstat doesn't have an option to print the number of bytes read and written (there is an io line in the output above, but for some reason nfsstat doesn't process it). The io line along with the number of reads and writes is useful for understanding how many VFS write operations are performed and the average size of these operations. To help me understand the workload pattern of the nfsd threads on one of my systems I developed the nfsrwstat.sh script:

$ nfsrwstat.sh

Date       Bytes_Read Bytes_Written
19:55:02   2139764    25187130  
19:55:07   0          3145728   
19:55:12   0          3358720   
19:55:18   0          3497984   
19:55:23   0          3629056   
19:55:28   0          3145728   

This script will help you better understand how much I/O your NFS server threads are performing, and you should be able to figure out if you have a mixed workload or a workload that is read or write intensive. When you compare this data along side the output from iostat you can start to see how the block I/O layer and I/O scheduler are handling the data as it's pushed down the stack . If you haven't read the NFS RFC of the Linux NFS FAQ I would highly recommend you do so. There is a plethora of information in those two documents, and it will make managing NFS a breeze!

Firewalling a Linux NFS server with iptables

When it comes to firewalling services, NFS has to be one of the most complex to get operational. By default the various NFS services (lockd, statd, mountd, etc.) will request random port assignments from the portmapper (portmap), which means that most administrators need to open up a range of ports in their firewall rule base to get NFS working. On Linux hosts there is a simple way to firewall NFS services, and I thought I would walk through how I got iptables and my NFS server to work together.

Getting NFS working with iptables is a three step process:

1. Hard strap the ports the NFS daemons use in /etc/sysconfig/nfs.

2. Add the ports from step 1 to your iptables chains.

3. Restart the portmap, nfs and iptables services to pick up the changes.

To hard strap the ports that the various NFS services will use, you can assign your preferred ports to the MOUNTD_PORT, STATD_PORT, LOCKD_TCPPORT, LOCKD_UDPPORT, RQUOTAD_PORT and STATD_OUTGOING_PORT variables in /etc/sysconfig/nfs. Here are the settings I am using on my server:

MOUNTD_PORT=”10050″
STATD_PORT=”10051″
LOCKD_TCPPORT=”10052″
LOCKD_UDPPORT=”10052″
RQUOTAD_PORT=”10053″
STATD_OUTGOING_PORT=”10054″

Once ports have been assigned, you will need to restart the portmap and nfs services to pick up the changes:

$ service portmap restart

Stopping portmap:                                          [  OK  ]
Starting portmap:                                          [  OK  ]

$ service nfslock restart

Stopping NFS locking:                                      [  OK  ]
Stopping NFS statd:                                        [  OK  ]
Starting NFS statd:                                        [  OK  ]

$ service nfs restart

Shutting down NFS mountd:                                  [  OK  ]
Shutting down NFS daemon:                                  [  OK  ]
Shutting down NFS quotas:                                  [  OK  ]
Shutting down NFS services:                                [  OK  ]
Starting NFS services:                                     [  OK  ]
Starting NFS quotas:                                       [  OK  ]
Starting NFS daemon:                                       [  OK  ]
Starting NFS mountd:                                       [  OK  ]

If you query the portmap daemon with rpcinfo, you will see that the various services are now registered on the ports that were assigned in /etc/sysconfig/nfs:

$ rpcinfo -p

   program vers proto   port
    100000    2   tcp    111  portmapper
    100000    2   udp    111  portmapper
    100024    1   udp  10051  status
    100024    1   tcp  10051  status
    100011    1   udp  10053  rquotad
    100011    2   udp  10053  rquotad
    100011    1   tcp  10053  rquotad
    100011    2   tcp  10053  rquotad
    100003    2   udp   2049  nfs
    100003    3   udp   2049  nfs
    100003    4   udp   2049  nfs
    100021    1   udp  10052  nlockmgr
    100021    3   udp  10052  nlockmgr
    100021    4   udp  10052  nlockmgr
    100021    1   tcp  10052  nlockmgr
    100021    3   tcp  10052  nlockmgr
    100021    4   tcp  10052  nlockmgr
    100003    2   tcp   2049  nfs
    100003    3   tcp   2049  nfs
    100003    4   tcp   2049  nfs
    100005    1   udp  10050  mountd
    100005    1   tcp  10050  mountd
    100005    2   udp  10050  mountd
    100005    2   tcp  10050  mountd
    100005    3   udp  10050  mountd
    100005    3   tcp  10050  mountd

Next up, we need to adjust the appropriate iptables chains to allow inbound connections to the NFS service ports. Here are the entries I added to /etc/sysconfig/iptables to allow NFS to work with iptables:

# Portmap ports
-A INPUT -m state –state NEW -p tcp –dport 111 -j ACCEPT
-A INPUT -m state –state NEW -p udp –dport 111 -j ACCEPT
# NFS daemon ports
-A INPUT -m state –state NEW -p tcp –dport 2049 -j ACCEPT
-A INPUT -m state –state NEW -p udp –dport 2049 -j ACCEPT
# NFS mountd ports
-A INPUT -m state –state NEW -p udp –dport 10050 -j ACCEPT
-A INPUT -m state –state NEW -p tcp –dport 10050 -j ACCEPT
# NFS status ports
-A INPUT -m state –state NEW -p udp –dport 10051 -j ACCEPT
-A INPUT -m state –state NEW -p tcp –dport 10051 -j ACCEPT
# NFS lock manager ports
-A INPUT -m state –state NEW -p udp –dport 10052 -j ACCEPT
-A INPUT -m state –state NEW -p tcp –dport 10052 -j ACCEPT
# NFS rquotad ports
-A INPUT -m state –state NEW -p udp –dport 10053 -j ACCEPT
-A INPUT -m state –state NEW -p tcp –dport 10053 -j ACCEPT

Then I restarted iptables:

$ service iptables restart

Flushing firewall rules:                                   [  OK  ]
Setting chains to policy ACCEPT: filter                    [  OK  ]
Unloading iptables modules:                                [  OK  ]
Applying iptables firewall rules:                          [  OK  ]

In addition to the rules listed above, I have entries to track state (using the conntrack module) and allow established connections. If everything went as expected, you should be able to mount your file systems without issue. To debug issues, you can use the following steps:

1. Add a LOG statement to your iptables INPUT chain to log drop packets.

2. Run tcpdump -i host X.X.X.X (host should be the client IP that is trying to mount / access your exported file system) and check to see if connections are making it to the NFS server.

3. Run rpcinfo -p to see if the correct ports were assigned.

With just a few steps, you can get NFS working with iptables. If you have any suggestions or comments, feel free to leave me a comment! I’d love to hear folks thoughts on this.