Using rpcdebug to debug Linux NFS client and server issues


Debugging NFS issues can sometimes be a chore, especially when you are dealing with busy NFS servers. Tools like nfswatch and nfsstat can help you understand what operations your server is servicing, but sometimes you need to get into the protocol guts to find out more info. There are a couple of ways you can do this. First you can capture NFS traffic with tcpdump and use the NFS protocol decoding features that are built into wireshark. This will allow you to see the NFS traffic that is going over the network, which in most cases is enough to solve a given caper.

Alternatively, you can use the rpcdebug utility to log additional debugging data on a Linux NFS client or server. The debugging data includes information on the RPC module, the lock manager module and the NFS client and server modules. The debugging flags you can set inside these modules can be displayed with the rcpdebug “-vh” option:

$ rpcdebug -vh

usage: rpcdebug [-v] [-h] [-m module] [-s flags...|-c flags...]
set or cancel debug flags.

Module Valid flags
rpc xprt call debug nfs auth bind sched trans svcsock svcdsp misc cache all
nfs vfs dircache lookupcache pagecache proc xdr file root callback client mount all
nfsd sock fh export svc proc fileop auth repcache xdr lockd all
nlm svc client clntlock svclock monitor clntsubs svcsubs hostcache xdr all

To enable a specific set of flags you can pass the name of a module and the flags to enable to rpcdebug:

$ rpcdebug -m nfsd -s proc
nfsd proc

In the example above I told the nfsd module to log debugging data about the RPC procedures that were received. If I hop on an NFS client and write 512-bytes to a file I would see the NFS procedures that were issued in the server’s messages file:

Nov 1 09:53:18 zippy kernel: nfsd: LOOKUP(3) 36: 01070001 0003fe95 00000000 ec426dec 9f402ff3 d2aac8b3 foo1
Nov 1 09:53:18 zippy kernel: nfsd: CREATE(3) 36: 01070001 0003fe95 00000000 ec426dec 9f402ff3 d2aac8b3 foo1
Nov 1 09:53:18 zippy kernel: nfsd: WRITE(3) 36: 01070001 0003fe95 00000000 ec426dec 9f402ff3 d2aac8b3 512 bytes at 0
Nov 1 09:53:18 zippy kernel: nfsd: COMMIT(3) 36: 01070001 0003fe95 00000000 ec426dec 9f402ff3 d2aac8b3 0@0

To log ALL of the nfsd debugging data possible, you can set flags to all. This will enable all of the flags in the module which will cause an excessive amount of data to be logged (don’t do this on production servers unless you have a specific need to do so!). Here is the output you would see from a single 512-byte write when all of the flags are enabled:

Nov 1 09:56:18 zippy kernel: nfsd_dispatch: vers 3 proc 4
Nov 1 09:56:18 zippy kernel: nfsd: ACCESS(3) 36: 01070001 0003fe95 00000000 ec426dec 9f402ff3 d2aac8b3 0x1f
Nov 1 09:56:18 zippy kernel: nfsd: fh_verify(36: 01070001 0003fe95 00000000 ec426dec 9f402ff3 d2aac8b3)
Nov 1 09:56:18 zippy kernel: nfsd_dispatch: vers 3 proc 1
Nov 1 09:56:18 zippy kernel: nfsd: GETATTR(3) 36: 01070001 0003fe95 00000000 ec426dec 9f402ff3 d2aac8b3
Nov 1 09:56:18 zippy kernel: nfsd: fh_verify(36: 01070001 0003fe95 00000000 ec426dec 9f402ff3 d2aac8b3)
Nov 1 09:56:18 zippy kernel: nfsd_dispatch: vers 3 proc 4
Nov 1 09:56:18 zippy kernel: nfsd: ACCESS(3) 36: 01070001 0003fe95 00000000 ec426dec 9f402ff3 d2aac8b3 0x2d
Nov 1 09:56:18 zippy kernel: nfsd: fh_verify(36: 01070001 0003fe95 00000000 ec426dec 9f402ff3 d2aac8b3)
Nov 1 09:56:18 zippy kernel: nfsd_dispatch: vers 3 proc 2
Nov 1 09:56:18 zippy kernel: nfsd: SETATTR(3) 36: 01070001 0003fe95 00000000 ec426dec 9f402ff3 d2aac8b3
Nov 1 09:56:18 zippy kernel: nfsd: fh_verify(36: 01070001 0003fe95 00000000 ec426dec 9f402ff3 d2aac8b3)
Nov 1 09:56:18 zippy kernel: nfsd_dispatch: vers 3 proc 7
Nov 1 09:56:18 zippy kernel: nfsd: WRITE(3) 36: 01070001 0003fe95 00000000 ec426dec 9f402ff3 d2aac8b3 512 bytes at 0
Nov 1 09:56:18 zippy kernel: nfsd: fh_verify(36: 01070001 0003fe95 00000000 ec426dec 9f402ff3 d2aac8b3)
Nov 1 09:56:18 zippy kernel: nfsd: write complete host_err=512
Nov 1 09:56:18 zippy kernel: nfsd_dispatch: vers 3 proc 21
Nov 1 09:56:18 zippy kernel: nfsd: COMMIT(3) 36: 01070001 0003fe95 00000000 ec426dec 9f402ff3 d2aac8b3 0@0
Nov 1 09:56:18 zippy kernel: nfsd: fh_verify(36: 01070001 0003fe95 00000000 ec426dec 9f402ff3 d2aac8b3)

Now you may be asking yourself what is that stuff following the procedure name? The data logged along with the procedure name contains the contents of the data structures sent as part of a given procedure. You can view the data structures and their members by opening up the correct NFS RFC and checking the argument list. Here is what the NFSv3 RFC says about the argument passed to the WRITE procedure:

struct WRITE3args {
         nfs_fh3     file;
         offset3     offset;
         count3      count;
         stable_how  stable;
         opaque      data<>;
};

The structure contains the file handle to perform the operation on, the offset to perform the write and the number of bytes to write. It also contains a stable flag to tell the server it should COMMIT the data prior to returning a success code to the client. The last field contains the data itself.

Once you have a better understanding of the fields passed to a procedure you can cross reference the lines in the messages file with the dprintk() statements in the kernel. Here is the kernel source for the dprintk() statement in the VFS WRITE code:

dprintk("nfsd: WRITE    %s %d bytes at %d\n",
         SVCFH_fmt(&argp->fh),
         argp->len, argp->offset);

This matches up exactly with the contents in the log file, and the dprintk() code shows the order everything is logged. This is useful stuff, though it’s probably not something you will need to access daily. ;) I’m documenting it here so I can recall this procedure down the road.

This article was posted by Matty on 2011-11-02 08:31:00 -0400 EDT