Figuring out what a hung Solaris process is doing inside the kernel


I had a a process hang last week on one of my Solaris hosts, and was curious what each thread was doing. The mdb utility is perfect for locating this information, since you an combine pid2proc with the walk and findstack dcmds to get the call stack of each thread in a process (in the example below, I am examining process id 48):

$ mdb -k

> 0t48::pid2proc |::walk thread |::findstack -v
stack pointer for thread ffffff02d6fd1080: ffffff000fac1df0
[ ffffff000fac1df0 _resume_from_idle+0xf1() ]
ffffff000fac1e20 swtch+0x147()
ffffff000fac1e80 cv_wait_sig_swap_core+0x170(ffffff02d6fd1256, ffffff02d6fd1258, 0)
ffffff000fac1ea0 cv_wait_sig_swap+0x18(ffffff02d6fd1256, ffffff02d6fd1258)
ffffff000fac1ec0 pause+0x48()
ffffff000fac1f10 sys_syscall32+0x101()
stack pointer for thread ffffff02d6fccc00: ffffff0011184d10
[ ffffff0011184d10 _resume_from_idle+0xf1() ]
ffffff0011184d50 swtch_to+0xe5(ffffff02eae948e0)
ffffff0011184db0 shuttle_resume+0x328(ffffff02eae948e0, ffffffffc00c2ed0)
ffffff0011184e50 door_return+0x21a(fedae9f8, 408, 0, 0, fedaee00, f5f00)
ffffff0011184ec0 doorfs32+0x134(fedae9f8, 408, 0, fedaee00, f5f00, a)
ffffff0011184f10 sys_syscall32+0x101()
stack pointer for thread ffffff02d6b18c20: ffffff000ff00d30
[ ffffff000ff00d30 _resume_from_idle+0xf1() ]
ffffff000ff00d60 swtch+0x147()
ffffff000ff00db0 shuttle_swtch+0x259(ffffffffc00c2ed0)
ffffff000ff00e50 door_return+0x242(0, 0, 0, 0, fec9ee00, f5f00)
ffffff000ff00ec0 doorfs32+0x134(0, 0, 0, fec9ee00, f5f00, a)
ffffff000ff00f10 sys_syscall32+0x101()
stack pointer for thread ffffff02d6abb400: ffffff000f9dab30
[ ffffff000f9dab30 _resume_from_idle+0xf1() ]
ffffff000f9dab60 swtch+0x147()
ffffff000f9dabe0 cv_timedwait_sig_internal+0x1d6(ffffff02d60a30dc, ffffff02d60a30e0, fb24, 0)
ffffff000f9dac10 cv_timedwait_sig+0x1f(ffffff02d60a30dc, ffffff02d60a30e0, fb24)
ffffff000f9dac40 kcf_svc_wait+0x86(ffffff000f9dac5c)
ffffff000f9dacc0 cryptoadm_ioctl+0xe0(9a00000000, 790d, feb91fc0, 100003, ffffff02d5747248,
ffffff000f9dade4)
ffffff000f9dad00 cdev_ioctl+0x45(9a00000000, 790d, feb91fc0, 100003, ffffff02d5747248,
ffffff000f9dade4)
ffffff000f9dad40 spec_ioctl+0x83(ffffff02d68eca00, 790d, feb91fc0, 100003, ffffff02d5747248,
ffffff000f9dade4, 0)
ffffff000f9dadc0 fop_ioctl+0x7b(ffffff02d68eca00, 790d, feb91fc0, 100003, ffffff02d5747248,
ffffff000f9dade4, 0)
ffffff000f9daec0 ioctl+0x18e(3, 790d, feb91fc0)
ffffff000f9daf10 sys_syscall32+0x101()
stack pointer for thread ffffff02d69fa540: ffffff000f9c1b30
[ ffffff000f9c1b30 _resume_from_idle+0xf1() ]
ffffff000f9c1b60 swtch+0x147()
ffffff000f9c1be0 cv_timedwait_sig_internal+0x1d6(ffffff02d616e5e0, ffffff02d616e5e8, fb24, 0)
ffffff000f9c1c10 cv_timedwait_sig+0x1f(ffffff02d616e5e0, ffffff02d616e5e8, fb24)
ffffff000f9c1c40 kcf_svc_do_run+0x7d()
ffffff000f9c1cc0 cryptoadm_ioctl+0x9b(9a00000000, 790e, 8, 100003, ffffff02d5747248,
ffffff000f9c1de4)
ffffff000f9c1d00 cdev_ioctl+0x45(9a00000000, 790e, 8, 100003, ffffff02d5747248, ffffff000f9c1de4)
ffffff000f9c1d40 spec_ioctl+0x83(ffffff02d68eca00, 790e, 8, 100003, ffffff02d5747248,
ffffff000f9c1de4, 0)
ffffff000f9c1dc0 fop_ioctl+0x7b(ffffff02d68eca00, 790e, 8, 100003, ffffff02d5747248,
ffffff000f9c1de4, 0)
ffffff000f9c1ec0 ioctl+0x18e(3, 790e, 8)
ffffff000f9c1f10 sys_syscall32+0x101()

It turns out the issue I encountered was due to a bug, which will hopefully be fixed in the near future.

This article was posted by Matty on 2009-05-15 00:07:00 -0400 -0400