If you've ever had a process hang in Opensolaris, and wanted to find out a bit more, here's a few quick steps you can do. These are by no means exhaustive, but if you just want to learn a little bit more:
- Run pstack pid on the process to see what it is doing. Possibly try it a few times to see if it's always calling the same function. This will print out the user call stack (most recently called function at the top). If it looks like a user function, it probably suggests some sort of coding bug.
- If it looks like it's stuck inside a syscall, you probably want to get the call stack within the kernel.
- As root, run mdb -k in another window.
- In mdb, type '::ps -t'. This will list all the processes (lines starting with an 'R'). Under each process will be at least one line that looks like this:
T 0xffffff014ac00e00
- For each of those 'T' lines under the process in question, take the 2nd value and run 'val::findstack'. I.e. with the above example, you'd type '0xffffff014ac00e00::findstack'
- Type ctrl-d to exit from mdb.
From there, what you do really depends on what the output is. From what I've seen, unkillable processes tend to be stuck waiting on a condition variable (cv_wait) -- or at least it's decent odds. Solaris 10 for a while had some issues with locking inside of /proc causing that, though it's been a couple of years since I've seen it, it looks like the issues have been resolved.
Most recently, I saw one that looks like it might be some sort of loop inside the ufs code on Nevada, though I will have to wait to see what those more experienced in this stuff are able to determine.