Monday, December 7, 2009

Houston Opensolaris User Group Meeting

Tomorrow (Dec 8th) will be the 20th anniversary of what is now called the Houston Opensolaris Users Group. Most people still call it the older name 'HUGS' (which stood for 'Houston User Group for Sun' since it made a nice acronym).

Normally we try to have presentations, however December is our annual party, so if anything is presented, it will be very informal. We'll have food, drinks, and prizes!

If you're in the Houston area, please stop by. The meeting starts around 6pm (though don't worry if you can't make it until later -- we're usually there until 7:30-8pm, sometimes longer). We'll be at 3 Greenway Plaza, 15th floor (when you walk out of the elevator, there should be a big sign pointing at the door to go in -- unfortunately we can't leave it propped open or the alarm goes off). Park in the visitor parking for Greenway plaza -- after 7pm you don't have to pay to exit (just keep your ticket).

Monday, May 25, 2009

Doing a small bit of diagnosis with mdb on process hangs

If you've ever had a process hang in Opensolaris, and wanted to find out a bit more, here's a few quick steps you can do.  These are by no means exhaustive, but if you just want to learn a little bit more:
  1. Run pstack pid on the process to see what it is doing.  Possibly try it a few times to see if it's always calling the same function.  This will print out the user call stack (most recently called function at the top).  If it looks like a user function, it probably suggests some sort of coding bug.
  2. If it looks like it's stuck inside a syscall, you probably want to get the call stack within the kernel.
  3. As root, run mdb -k in another window.
  4. In mdb, type '::ps -t'.  This will list all the processes (lines starting with an 'R').  Under each process will be at least one line that looks like this:
    T  0xffffff014ac00e00 
  5. For each of those 'T' lines under the process in question, take the 2nd value and run 'val::findstack'.   I.e. with the above example, you'd type '0xffffff014ac00e00::findstack'
  6. Type ctrl-d to exit from mdb.
From there, what you do really depends on what the output is.  From what I've seen, unkillable processes tend to be stuck waiting on a condition variable (cv_wait) -- or at least it's decent odds.  Solaris 10 for a while had some issues with locking inside of /proc causing that, though it's been a couple of years since I've seen it, it looks like the issues have been resolved.

Most recently, I saw one that looks like it might be some sort of loop inside the ufs code on Nevada, though I will have to wait to see what those more experienced in this stuff are able to determine.

Tuesday, May 19, 2009

Opensolaris: Now with more color (ls)

An actual blog entry!

PSARC 2009/228 was just integrated.  This adds a number of GNU compatibility options to the Solaris ls (/bin/ls), the most popular option being colorized output.   Prior to this, the only option was to use GNU ls.  Unfortunately, there were a number of issues that preclude GNU ls being a replacement for Solaris ls:
  1. GNU ls does not support Solaris specific filesystem features such as NFS4 ACLs (used with ZFS) and extended attributes.
  2. GNU ls is not POSIX or SUS compliant.
  3. The output format of GNU ls has changed in releases, and while the output of ls is not an interface, it seems undesirable to change output format for no underlying technical reason.
  4. The upstream community is not interested in maintaing patches for either 1 or 2, citing a desire to only support 'standard' features.
This left the option of maintaining a fork, or adding the support to Solaris ls.  I opted for the latter.  The actual work was actually fairly quick.  A few hours scattered across a week or so.  A bit more work was required to work up all the documentation.  One of the nice things about Opensolaris is that the processes for working on the core OS require that attention is paid to detail.  As annoying as it might seem at times (and even then it's not bad), once you go through it, it makes it more obvious how it's key to maintaining high code quality.