Monday, December 28, 2009

Testing PostgreSQL 8.5-alpha3 with peg

PostgreSQL 8.5-alpha3 was announced last week. The biggest single feature introduced in it is Hot Standby, which allows you to run queries against a server that's being used as a Warm Standby replica. Since you can make any number of such replicas from a single master database, this introduces a whole new way to scale up PostgreSQL server farms in situations where you can live with queries that won't necessarily have the very latest data in them, due to replication lag. For example, it's a great way to run large batch reports like daily business summaries against the standby, rather than beating the master server with that load. Just wait a little bit after the end of the day for the transactions to copy over, then kick the query off against the hot standby system.

Actually getting two servers up and running so you can test this feature can be a drag though, since the alpha releases aren't necessarily going to be packaged up for you to install easily. In order to test a new version of PostgreSQL built from source, there are a fair number of steps involved: checkout the source, build, create a database cluster, start the server, and then you can finally run a client. Each one of these has its own bits you likely need to customize, from needing a directory tree to keep all these pieces (source, binaries, database) organized to source configuration time options.

After a few years of building increasingly complicated scripts to handle pieces of this job, recently I assembled them Voltron-style into one giant script that takes care of everything: peg, short for "PostgreSQL environment generator". The basic idea is that you give peg a directory to store everything in, point it toward the source you want to use, and it takes care of all the dirty work.

The best way to show how peg can help speed up development and testing is to see how easy it makes testing the latest PostgreSQL 8.5 alpha3:
$ git clone git://
$ sudo ln -s /home/gsmith/peg/peg /usr/local/bin/peg
$ mkdir -p pgwork/repo/tgz
$ pushd pgwork/repo/tgz/
$ wget
$ popd
$ peg init alpha3
Using discovered PGWORK=/home/gsmith/pgwork
Using discovered PGVCS=tgz
Using tgz repo source /home/gsmith/pgwork/repo/tgz/postgresql-8.5alpha3.tar.gz
Extracting repo to /home/gsmith/pgwork/src/alpha3
tar: Read 6656 bytes from -
Extracted tgz root directory is named postgresql-8.5alpha3
No checkout step needed for PGVCS=tgz
$ . peg build
$ psql
psql (8.5alpha3)
Type "help" for help.

gsmith=# \q
That's 9 lines of commands to go from blank development system to working psql client, and half of that was installing peg. The above downloads and installs peg (in /usr/local/bin, you may want to use a private binary location instead), grabs the alpha3 source (from the US, not necessarily best for you), and build a server using the standard options for testing--including debugging and assertion checks. If you want to do performance tests instead, you can do that with peg by setting the PGDEBUG variable to something else. That's covered in the documentation.

Warning: if you try to do this on a RedHat/CentOS system, you will probably discover it won't work. PostgreSQL 8.5 requires a newer version of the Flex tool in order to build from an early source code build now than is available in RHEL5 and its derivatives. I've got some notes on Upgrading Flex from source RPM to compile PostgreSQL from CVS you can read. There were official RPM packages of alpha2 released that bypass this problem, you may be able to get an alpha3 set from there in the near future too.

My goal was to make learning how to use peg pay for itself in time savings the first time you use it for testing a PostgreSQL development snapshot. The script still has some rough edges, but I've been using it regularly for over a month now with minimal functional issues. The documentation is almost complete, even showing examples of how to use peg to create two PostgreSQL installs on the same host--which allows testing hot standby even if you've got only one system.

peg is hosted on github, I've heavily noted the things that need improvement with TODO tags in the source and documentation, and patches are welcome. Hope it helps you out, and that you'll be joining the 8.5-alpha3 testers.

Thursday, November 5, 2009

Ergonomic keyboards: Kinesis vs. Microsoft

I used to be pretty hardcore as my keyboard choices go. I have a stack of vintage IBM and Lexmark Model M keyboards, and can grade them like a wine connoisseur ("these '96 models just doesn't have the bounce I expect from even a good vintage '93 or '94", even though they're all far superior to the brand new Unicomp models still on the market). But like many computer users, I sometimes suffer from pain in my hands and arms from excessive typing. A while ago I decided I had to give up the Model M, due to both the excessive typing force it encourages and my increasing discomfort with the standard keyboard layout. A succession of keyboards aimed at ergonomic use have followed. I've been though enough configurations of those now to feel comfortable passing on some shopping recommendations, gathered as I considered what I wanted for a second keyboard after settling on a primary one.

Obligatory warning note: please be very careful here! If you're in enough pain from typing to want or need an alternate keyboard, you could have a problem much more serious than just a keyboard change will fix. My own experiments to improve what was diagnosed as classic "tennis elbow" were carefully supported by monitoring during doctor and physical therapy visits, to make sure there wasn't really a larger issue and that I wasn't making things worse in the process. You don't want to end up like the poor author of Your wrists hurt, you must be a programmer, who was left unable to type altogether after poorly executed care here.

Shopping recommendations

There's a lot to chew on below. Here's the condensed version, which answers the most common questions I and others seem to have about the more mainstream Kinesis products:
  • The Kinesis Freestyle with VIP kit is my recommended middle of the road pick. It has some parts you'll want even if you decide you want the somewhat more difficult to deal with (and much more expensive) Ascent "multi-tent" kit. This gets you most of the benefits possible here, using the most common ergonomic keyboard positioning, while still offering some flexiblity and upgrade paths. The combination is $129 as I write this at the retailer I bought mine from and would recommend, The Human Solution.
  • If you think a more vertical setup might be a requirement for you one day, you can do add that later on the Freestyle. But you really should see how you do with just the VIP setup first, because the Ascent alternative is both really expensive and has its own potential issues. The most I'd recommend you consider spending right from the start is the extra $36 to get the the version of the keyboard with the wider 20" separation, because that you're going to want if you want to go vertical one day (but really only in that situation, so don't consider that important either)
  • If you're OK without so much flexibility or really need PS/2 support, consider the Kinesis Maxim keyboard. It's about the same price, is easier to move around due to its more integrated design, and it supports the position you're likely to settle on with the Freestyle anyway.
  • Want to experiment with a more ergonomic keyboard design to see if it helps you, but without committing so much money at first? Microsoft's Natural Ergonomic Keyboard 4000 is a decent cheap (under $50 if you shop around a bit) option here, albeit without a good feel for presses of individual keys.
Now we head backward toward how I came to these conclusions.

Keyboard ergonomic issues

In order to put all this into some context, first we need to talk about what's wrong with the traditional keyboard. Keys that require less force to activate are always good, but I have those on my laptop and it's not really comfortable to type on it, so that's clearly not enough. The main thing I've come to appreciate is that our forearms aren't designed to be horizontal for long periods of time. Try this experiment: let your arms fall loose to your sides. Note the position your hands are in relative to your arms? Your thumb is forward, your hands lined up straight with your forearms. Now hold your arms in front of you. Your arms feel more comfortable with your thumbs are pointing toward the ceiling, right? That's a natural position your body is comfortable with.

Now note how your hands and arms move when you use a standard keyboard. There's a 90 degree twist from your natural position. Your hands might be bent backwards some too; that's not good either, as it tenses the muscles in your forearms (which then pull on things all the way up to the elbow). And unless your arms are much more flexible than mine, you probably are more comfortable with them further apart than when you're using a standard keyboard, where you have to bend your elbow further inwards than is really ideal in order to get both hands on the home row. Kinesis has a good description of the various problem areas here in their awkward postures document.

On a particularly painful day where it hurt just to rotate my hands into the typing position, I noted that all it took was returning to thumbs-upward and separating them to make that feel better. The first question this raised for me was whether I could type like that.

Kinesis Ascent

You sure can; the Kinesis Freestyle with Ascent accessory (also called the "Multi-tent" kit) lets you split the keyboard in half and push it straight up if you want. The product catalog on their site isn't all that great, I found the shopping experience at reseller The Human Solution easier to navigate and order from (cheaper than direct from Kinesis, too).

The Ascent is a really expensive upgrade for the keyboard. I found it valuable for mapping out what my options were, because you can setup just about any angle/position combination with it. Ultimately I wouldn't recommend it for most people though. For me at least, the really sharp angles (near the 90 degree vertical position shown in the product picture) were hard to type on for a couple of reasons. While I touch type, I learned just how much I look at the keys for punctuation when in this position, because it's hard to do.

The biggest problem with this accessory is that while well constructed, it's still just sheet metal. There was too much flex in the design for me to really be happy with it, particularly in its vertical position. I had to tense more muscles than I expected to type that way, because I had to be so careful not to press too hard toward the center. It feels like you might collapse it inward, even though that's actually hard to do, because it does give a little in that direction. And angles short of vertical didn't turn out to be very useful to me either. Once I made the angle greater than around 20 degrees, I couldn't get a comfortable position until I reached >70, at which point I was back to being worried about collapse toward the middle again. At small angles, it was much sturdier, but there was still just a bit more play than I like. What I wanted instead was to adjust into the right position, than make it really solid in that spot. That's the approach more explicitly taken by some of the other competitors here, like Goldtouch; there's a good blog entry discussing their products, and I ruled out Goldtouch because I wanted the option of being able to separate the halves (even though it turns out I don't really need that right now).

One part of the Ascent design aimed to improve stability is a metal connector plate that attaches to the two keyboard halves. I found this to be a bit sketchy in that the halves aren't locked into place as firmly as I'd like (it's just a couple of bolts). And it also has the problem that it enforces lining up the two halves to be straight. The stock Freestyle keyboard comes with a removable "pivot tether" connecting the two halves of the keyboard. That allows splaying the two halves of the keyboard apart from one another at an angle, which I found works better for me than trying to keep them straighter. You certainly can splay with the Ascent by just not using the connector plate, but now you're dealing with two completely disjoint sections. The biggest issue I found with that is repeatability: without a way to lock into the position I want to use, the two halves tended to drift toward sloppy and bad positions without me noticing. Moving the keyboard around is a nightmare too without the connector plate.

Microsoft Ergonomic Keyboards

It's hard to ask the day job for a keyboard combination that approaches $400 with all the fixins when I wasn't even sold on it myself completely. The last two years I've mainly worked at home, but sometimes visited my far off company office for a week or two. In parallel with investigating options for my home keyboard where most of my job happens at, I also wanted a cheaper solution for my desk at the office.

After two long trips typing on every keyboard setup to play with at both Fry's and Micro Center, the cheapest option that I liked at all here was the Microsoft Natural Ergonomic Keyboard 4000. These are regularly available in the sub-$50 range even at retail (I've grabbed one on sale for $40 at both Fry's and Best Buy), and Microsoft has some other models available too you might consider as well. There's a good commentary praising the keyboard in the demanding Emacs context in a review that I found informative.

This keyboard gets a couple of things right. Note that there is some separation between the two halves of the keyboard, and that it slopes vertically downward from the center toward the left and right edges. These are the two things I found most useful about the Kinesis keyboard. What's really interesting is that the angles the Microsoft keyboard fixes those at are extremely close to what I settled into using even in the much more adjustable Kinesis+Ascent combination. The Microsoft keyboard also gets one detail perfect that Kinesis doesn't handle on their Freestyle design: it slopes downward from the nicely integrated wrist wrests to the function keys. Now, it turns out I don't like this as much as I thought. The most comfortable keyboard position I've found puts the keyboard in my lap (I'm using a Gamer's Desk w/Max Mousepad). The integrated front/back slope of the Ergonomic 4000 starts out too high for that to work really for me. I think you really need to be up higher and have the keyboard start below your lap, perhaps with an office chair and a keyboard tray, for a downward sloping keyboard to be optimally placed.

Ultimately, I never got comfortable with this Microsoft keyboard. My main issue is that the big keys on the bottom, mainly Alt and Space, have awful key action on them. The space bar clunk and doesn't feel right, and I constantly missed key chords using Alt in them because I didn't press the giant but sloppy key down fully. Having to press that hard is something I can't take in a keyboard, particularly for multiple key actions. The source for the positive review I linked to above apparently isn't bothered by this because he hits alt with his palm, which I just can't get used to (and isn't compatible with my goal of not being too different from what I do on a laptop). Ultimately I'm left thinking Microsoft's Natural Ergonomic Keyboard 4000 has the right layout and position, but ultimately the cheap keys it uses are disappointing. But if you don't want to spend serious bucks on a keyboard, or want to try out more ergonomic positioning but aren't committed to it just yet, it's cheap enough that you might find it worth trying out. The keyboard is common enough that you might easily find a store who carries one you can try out too.

Exotic Options

Kinesis does make keyboards that aims for better positioning in the front to back dimension too, where sloping downward away from your body is acknowledged to work out well in many situations. Their Advantage Pro is a very nice keyboard I was able to try out via a coworker who loves his. I ultimately rejected that choice because it's just too different from the standard keyboard to be comfortable for me, and since I do have to balance my time with a healthy dose of laptop keyboarding I couldn't see myself ever really getting used to it. If that's not important to you and you don't mind some retraining time, it's an expensive but quite nice product. It does get the downward curve bits better than the Freestyle model I settled on, and I think it has the potential to be more ergonomic in the end if you can accept those trade-offs.

Even the Advantage Pro seems quite familiar compared to the really difficult to get used to DataHand, which I was also able to borrow to evaluate for a bit at one point. The DataHand I can only see making sense if you're so bad off that you really can't handle holding your hands/arms in a standard position at all, or can't press down/move around anymore and it's easier for you to only move your fingers a little bit (you "click" a set of tiny switches in each direction around your finger to type). The retraining time and difference between other keyboards is really substantial on a DataHand. That's really not a cheap option either, if you can even buy one.

Revisiting the Kinesis Freestyle

What I came to realize here is that even with all the flexibility available with the Kinesis Freestyle with Ascent accessory, in practice what worked out well is basically the same positioning the newer Microsoft ergonomic keyboards provide. Somewhere between 10 and 20 degrees of slope downward from the center is enough to get rid of the worst of the rotation issues a standard horizontal keyboard introduces, and a moderate split between the two halves is probably all you need unless you're going full-on vertical (or have an enormous chest and tiny arms like a T-rex). Kinesis knows this perfectly well too; their Maxim keyboard provide a very similar configuration to the one settled into, ready to go as an integrated unit.

There are two viable ways to convert a Kinesis Freestyle keyboard to this sort of position. I already purchased their Freestyle VIP kit, which is the cheapest way to get a set of the wrist rests to go with the Ascent kit too. After giving up on the Ascent, I used the other portion of the VIP kit, the risers providing either a 10 or 15 degree lift, and found those to be plenty stable for typing on. While the range of adjustment here is limited, I think Kinesis and Microsoft have correctly nailed that the common case is going to want something in that range anyway. A Kinesis Freestyle with the pivot tether installed for good splay, risers at 10 degrees, and the wrist wrests is quite similar to the only position that's offered on the Microsoft Ergonomic keyboard and Kinesis's own Maxim, and unless you really have exceptional needs that required moving toward full vertical I think this is the best option available. Since I find myself more comfortable with the 15 degree lift position, I decided not to get the even more stable Kinesis Freestyle Incline platform (fixed at 10 degrees).

The only thing I wish was more adjustable on the Kinesis Freestyle is the front/back slope, which is fixed at a moderate upward slope. As I mentioned before it's hard to get a downward one to work well without a keyboard tray anyway, so that's not that critical to me, and I wasn't happy with the models on the market with an integrated slope that way.

That's where I'm at with keyboards now: Freestyle with VIP kit. Works great, wish it was a bit easier to move around though. I am concerned that I'm going to knock it off my desk one day and break the whole middle tent connector in particular. Some of the other Kinesis options here (the Maxim and the Freestyle with Incline kit) look more sturdy in that respect, if that's important to you they're worth considering. I like the extra flexibility of the Freestyle, but as I've mentioned repeatedly I don't think it's really necessary for most people; just nice to have.

P.S. Yes, I've done the same level of research and tests on ergonomic mice too; will cover those next time I get some time to write on this subject.

Running cron on Ubuntu

Automating regular admin tasks with cron is a great way to handle all sorts of chores. Every day systems around the globe e-mail me cron reports showing their backups were successful and a report of how many bad guys tried to break in (by the obvious front door of ssh at least). I run an Ubuntu desktop at home, and I'd like to automate tasks on it with cron as well. Here's a quick guide to the Ubuntu-specific bits required to get cron going, presuming you're already familiar with it on other systems.

The Ubuntu wiki has a Cron How-To. When I started my night, the description in the "Enable User Level Cron" section was wrong. It suggested that the default setup didn't allow regular users to add jobs with "crontab -e" and have them execute, and that you first had to setup cron.allow for them. This may be true on some other system, but it's not the Debian or Ubuntu default. If you run into this restriction, you'll know it--running crontab will throw the error right in your face. So if that doesn't happen, but your jobs aren't running, the allow/deny section isn't your issue. I updated that section to match the description of the behavior here you'll see with "man crontab".

I also updated the how-to with the troubleshooting bits I either needed this time or have run into in the past. I'm saving the parts I added here in case somebody decides they're not worthy and deletes them:

Troubleshooting cron problems
  • Edits to a user's crontab and jobs that are run on their behalf are all logged by default to /var/log/syslog and that's the first place to check if things are not running as you expect.
  • When adding a new entry to a blank crontab, forgetting to add a newline at the end is a common source for the job not running. If the last line in the crontab does not end with a newline, no errors will be reported at edit or runtime, but that line will never run. See man crontab for more information. This has already been suggested as a bug.
  • If a user was not allowed to execute jobs when their crontab was last edited, just adding them to the allow list won't do anything. The user needs to re-edit their crontab after being added to cron.allow before their jobs will run.
  • When creating a crontab for the root user, the user name must be specified as a parameter after the date/time parameters. Accidentally including the user name that way in a user-specific crontab will result in trying to run the user's name as a command, rather than what was expected.
  • Entries in cron may not run with the same environment, in particular the PATH, as you expect them to. Try using full paths to files and programs if they're not being located as you expect.
  • The "%" character is used as newline delimiter in cron commands. If you need to pass that character into a script, you need to escape it as "\%".
Mail output

The behavior I expect from a UNIX system is that if I run a job in a crontab, and that job writes something to standard output, that output will be e-mailed to me. You're supposed to get that on Ubuntu, via the Exim4 Mail Transport Agent; see Setting Up Your System To Use E-Mail for details. If that's what you've got, you may get mail that cron sends just fine, to the root-ish user setup by the default installer.

But, if like me you installed something that pulled in the postfix package as a prerequisite (but didn't bother configuring it at the time), you'll discover cron output doesn't get mailed to you. Instead you'll see something like this in /var/log/syslog:
Nov  5 20:33:01 gsmith-desktop /USR/SBIN/CRON[26173]: (root) CMD (/home/gsmith/test)
Nov 5 20:33:01 gsmith-desktop postfix/sendmail[26176]: fatal: open /etc/postfix/ No such file or directory
Nov 5 20:33:01 gsmith-desktop /USR/SBIN/CRON[26166]: (root) MAIL (mailed 5 bytes of output but got status 0x004b )
This tells us that even though postfix, the mail server, is installed, because it hasn't been configured it isn't doing anything. That's kind of poor behavior just because I was too busy to fool with it at the time that prompt flashed by; what I'd expect is that the default would provide local delivery only until I wanted something better. Here's a simple guide to a minimally install and configure postfix:
sudo apt-get install postfix
sudo dpkg-reconfigure postfix
You can answer its questions like this to setup what cron needs to deliver to your system:
  • Choose "local only"
  • Enter your user name for "Root and postmaster mail recipient"
  • Use the defaults for the rest of the questions
Now you should see that mail is being delivered in the log files when jobs execute. In order to read the mail, you might want the standard UNIX 'mail' utility, which also isn't installed by default; fix that like this:
sudo apt-get install mailutils
And that might also help random UNIX programs you install that expect 'mail' is available to send you things. Alternative text-based mail programs include alpine and mutt. GUI mail programs like thunderbird and evolution could also be used, although those are pretty heavyweight just to read the kind of local mail cron generates.

If you want your server to mail to instead deliver mail to the outside world, that's a whole nother level of complexity altogether. I suggest getting local delivery going first, to confirm that cron is working normally, before launching into that adventure. Then you can deal with the fun that is the Postfix setup documentation. Good luck with that.

Tuesday, November 3, 2009

PostgreSQL at the LISA conference in Baltimore

This week the Usenix LISA Conference is running in downtown Baltimore. There will be a PostgreSQL booth in the exhibition area from noon-7pm on Wednesday and from 10am-2pm on Thursday. Robert Treat is the lead elephant for this show, and is too busy with booth setup to have time to write fluff pieces like this one. I'm co-hosting, and we have some other volunteers you might see too. We're basically the least attractive booth babes ever, but if you're looking to talk about open-source databases and in the neighborhood, we're more informative than the local dolphin alternatives.

This week is also my first working for 2ndQuadrant, as the latest addition to their global PostgreSQL and replication consulting staff. If you're in the US, were interested in 2ndQuadrant's services or array of training classes, but figured that would be too hard to coordinate with their UK or Italian offices, that's something I can help out with now.

Tuesday, October 13, 2009

Triple partitioning and Dual Booting with Mac OS

A few months ago I bought a used Intel MacBook I'm now switching over to using as my primary personal laptop. I'm still using Linux as my preferred OS elsewhere though, so I need to deal with dual-boot both on its hard drive (and no, a virtualized Linux install will not be fast enough). I also got a new backup hard drive, and wanted to partition that to support three OSes. This is the saga of how that all went.

Starting with a blank hard drive, getting the OS to dual boot was pretty easy. The easy route to get started is to use the Mac OS X Disk Utility for the intial partitioning. You pick the number of partitions, it starts them all equally sized, but you can tweak the vertical lines between them to adjust that. It's not a bad UI, and it will create the type of EFI partition needed to boot from OS X properly. Once that works, you just need to install rEFIt to allow booting from the other partition. I used the standard packaged installer, followed instructions for Getting into the rEFIt menu, did "Start Partitioning Tool" to sync partition tables (a lot more about this below), then popped an Ubuntu disk in and installed using the "Boot Linux CD" option at the rEFIt menu. The How-To Install Ubuntu 8.10 on a White MacBook were basically spot on to make 9.04 work too, and if you partition right from the beginning you avoid the whole Boot Camp step and its potential perils.

The main usability problem I ran into is that the touchpad kept "clicking" when I typed, particularly keys near it like "." when typing an IP address. I followed the instructions for Disable Touchpad Temporarily When Typing and that made the problem go away. The wireless driver in the 2.6.28 kernel included with Ubuntu 9.04 was still a bit immature on this hardware when I connected to the 802.11n network here. To improve that, I grabbed the PPA kernel, which fixed the worst of it; Gentoo Linux on Apple MacBook Pro Core2Duo looks like a good guide to which kernels tend to be better on Mac hardware in general. The wireless is still a bit unstable when doing large transfers, it just stops transferring sometimes. Annoying, but not a showstopper in most cases; I just plug into the wired network for things like system updates. I'm much more annoyed by not having a right mouse button, much less a third one to open links in Firefox as new tabs only when I want to like the Thinkpad I was using has.

The tough part came when I tried to get my new external backup drive working (my old one died in the middle of all this). Here I wanted a partition with FAT32 (compatible with any Windows install and for backing up my month old but already broken Playstation 3), one for Mac OS using its native HFS+ (for certain Time Machine compatibility), and one for Linux using ext3. This turned out to be harder than getting the boot drive working, mainly because rEFIt didn't do the hard part for me.

The background here is that Windows and Linux systems have all been using this awful partitioning scheme for years that uses a Master Boot Record(MBR) record to hold the partition information. This has all sorts of limitations, including only holding 4 partition entries. To get more, you have to create an "extended partition" entry that holds the rest of them. Apple rejected this for their Intel Macs, and instead adopted an Intel scheme that happens to work better with how Apple's hardware uses EFI to boot instead of the standard PC BIOS.

This means that to really get a disk that's properly partitioned for Mac OS X, you need to put a GPT partition table on it. But then other types of systems won't necessarily be able to read it, because they will look for an MBR one instead. It's possible to create a backwards compatibility MBR partition table from a GPT one (but not the other way), with the main restriction being that EFI will want a boot partition and the MBR can't have extended partitions. This means you can only get 3 old-school MBR partitions on the disk. That's how many I needed in my case, but things would have been more complicated had I tried to triple-boot my install disk. Then I'd have needed to worry about putting the Linux swap partition into the GPT space because it wouldn't have fit into the MBR section.

I hoped to do the whole thing on my Linux system and then just present the result to the Mac, but that turned out to be hard. The GRUB FPT HOWTO covers what you're supposed to do. I knew I was in trouble when step two didn't work, because the UI for "mkpart" within parted has changed since that was written; here's what worked to get started by creating a GPT partition table:
$ sudo parted /dev/sdb
GNU Parted 1.8.8
Using /dev/sdb
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) mklabel
Warning: The existing disk label on /dev/sdb will be destroyed and all data on this disk will be lost. Do you want to
Yes/No? yes
New disk label type? [msdos]? gpt
(parted) mkpart non-fs 0 2
(parted) quit
Information: You may need to update /etc/fstab.
Following the whole procedure would be quite messy though, and I did not have lot of faith that the result would also be compatible with OS X's requirements here. Most of those are outlined on the rEFIt "Myths" page, but there's a lot of aborb there.

I started over by wiping out what I did above the start of the disk, where the partition table lives ("dd if=/dev/zero of=/dev/sdb" and wait a bit before stopping it). Then I used the OS X disk utilty again from the Macbook to create the 3 partitions I needed. Since this doesn't create Linux partitions, I created the non-HFS+ ones as both fat32. Then, connect the drive back to the Linux system to convert one of them to ext3. This didn't work out so hot:
$ sudo parted /dev/sdb
GNU Parted 1.8.8
Using /dev/sdb
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) p
Model: ST932042 1AS (scsi)
Disk /dev/sdb: 320GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number Start End Size File system Name Flags
1 20.5kB 210MB 210MB fat32 EFI System Partition boot
2 211MB 107GB 107GB fat32 UNTITLED 1
3 107GB 214GB 107GB fat32 UNTITLED 2
4 214GB 320GB 106GB hfs+ Untitled 3

(parted) rm 3
(parted) mkpart
Partition name? []? bckext3
File system type? [ext2]? ext3
Start? 107GB
End? 214GB
(parted) print
Model: ST932042 1AS (scsi)
Disk /dev/sdb: 320GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number Start End Size File system Name Flags
1 20.5kB 210MB 210MB fat32 EFI System Partition boot
2 211MB 107GB 107GB fat32 UNTITLED 1
3 107GB 214GB 107GB fat32 bckext3
4 214GB 320GB 106GB hfs+ Untitled 3

(parted) quit
Information: You may need to update /etc/fstab.
That changed the label...but not the type? I tried a few other approaches here in hopes they would work better. Tried deleting then exiting before creating again. Tried using "ext2" (the default) as the type. The partition was still fat32.

My reading made it clear that using parted from the command line is really not a well tested procedure anymore. The GUI version, gparted, also knows how to operate on GPT partition tables (even if it's not obvious how to create them rather than MBR ones), and that is the primary UI for this tool now. This worked; if I changed the type of the partition using gparted to ext3 and had it format it, the result was what I wanted:
$ sudo parted /dev/sdb
GNU Parted 1.8.8
Using /dev/sdb
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) print
Model: ST932042 1AS (scsi)
Disk /dev/sdb: 320GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number Start End Size File system Name Flags
1 20.5kB 210MB 210MB fat32 EFI System Partition boot
2 211MB 107GB 107GB fat32 UNTITLED 1
3 107GB 214GB 107GB ext3 bckext3
4 214GB 320GB 106GB hfs+ Untitled 3

(parted) quit
Linux will mount all three partitions now (with the HFS+ one as read-only), OS X will mount FAT32 and HFS+ as expected. I've heard so many bad things about the OS X ext2 driver that I decided not to install it; can always use the FAT32 volume to transfer things between the two OSes if I have to.

But we're not done yet though, because the regular MBR on this system is junk:
$ sudo sfdisk -l /dev/sdb

WARNING: GPT (GUID Partition Table) detected on '/dev/sdb'! The util sfdisk doesn't support GPT. Use GNU Parted.

Disk /dev/sdb: 38913 cylinders, 255 heads, 63 sectors/track
Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0

Device Boot Start End #cyls #blocks Id System
/dev/sdb1 0+ 38913- 38914- 312571223+ ee GPT
start: (c,h,s) expected (0,0,2) found (0,0,1)
/dev/sdb2 0 - 0 0 0 Empty
/dev/sdb3 0 - 0 0 0 Empty
/dev/sdb4 0 - 0 0 0 Empty
One big GPT partition, not one for each actual partition. This won't mount in Windows or on other MBR-only systems (like my PS3, speaking of junk). OS X doesn't care about that detail when it created the partition table in the first place. That's one of the thing Boot Camp fixes, but if you've already partitioned the drive manually it's too late to use it. When I did the dual-boot install, rEFIt fixed this for me (even though I didn't even understand what it did at the time) when I ran its "Start Partitioning Tool" menu option. If you want to make a proper MBR from an existing GPT yourself on a non-boot volume, you need to run the gptsync utility it calls for you by hand.

gptsync is available for Ubuntu. Here's what I did to grab it and let it fix the problem for me:
$ sudo apt-get install gptsync
$ sudo gptsync /dev/sdb

Current GPT partition table:
# Start LBA End LBA Type
1 40 409639 EFI System (FAT)
2 411648 208789503 Basic Data
3 208789504 417171455 Basic Data
4 417171456 624880263 Mac OS X HFS+

Current MBR partition table:
# A Start LBA End LBA Type
1 1 625142447 ee EFI Protective

Status: MBR table must be updated.

Proposed new MBR partition table:
# A Start LBA End LBA Type
1 1 409639 ee EFI Protective
2 * 411648 208789503 0c FAT32 (LBA)
3 208789504 417171455 83 Linux
4 417171456 624880263 af Mac OS X HFS+

May I update the MBR as printed above? [y/N] y

Writing new MBR...
MBR updated successfully!
Afterwards, the GPT looks fine, and now MBR-based utilities understand it too; the good ones even know they shouldn't manipulate it directly:
$ sudo parted -l

Model: ST932042 1AS (scsi)
Disk /dev/sdb: 320GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number Start End Size File system Name Flags
1 20.5kB 210MB 210MB fat32 EFI System Partition boot
2 211MB 107GB 107GB fat32 UNTITLED 1
3 107GB 214GB 107GB ext3 bckext3
4 214GB 320GB 106GB hfs+ Untitled 3

$ sudo sfdisk -l /dev/sdb

WARNING: GPT (GUID Partition Table) detected on '/dev/sdb'! The util sfdisk doesn't support GPT. Use GNU Parted.

Disk /dev/sdb: 38913 cylinders, 255 heads, 63 sectors/track
Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0

Device Boot Start End #cyls #blocks Id System
/dev/sdb1 0+ 25- 26- 204819+ ee GPT
start: (c,h,s) expected (0,0,2) found (1023,254,63)
end: (c,h,s) expected (25,127,14) found (1023,254,63)
/dev/sdb2 * 25+ 12996- 12971- 104188928 c W95 FAT32 (LBA)
start: (c,h,s) expected (25,159,7) found (1023,254,63)
/dev/sdb3 12996+ 25967- 12972- 104190976 83 Linux
/dev/sdb4 25967+ 38896- 12930- 103854404 af Unknown
I probably don't need the 210MB set aside for the "EFI System Partition" here, but am glad I got it. The backup drive I bought is the same one I put into the Macbook (standard operating procedure for me--I don't like to have a laptop that I can't recover from a hard drive failure on using parts I already own). If the main drive fails, knowing I can throw it into the Mac and have a decent shot of using it without having to repartition and lose everything first is worth that bit of wasted space. I expect that I should be able to swap drives, run the OS X installer, and hit the ground running if something went bad. If I'm lucky I won't ever have to find out if that's true or not.

I'm not really a lucky guy, so expect a report on that one day too.

One final loose end: what if you don't have a computer running Ubuntu around, and want to get this sort of partition setup with GPT and MBR setup using just OS X? The regular rEFIt installer doesn't seem to address this, the binary needed only gets installed into the boot area rather than somewhere you can run it at, and it only runs against the boot volume. You could probably build from source to get a copy instead. There is an old copy of gptsync available as part of a brief Multibooting tutorial that covers some of the same material I have here. There's a more up to date version of the utility with a simple installer available at enhanced gptsync tool that worked fine for my friend who tested it. If you run that installer, gptsync is then available as a terminal command. Use "df" to figure out what the names of your devices are, and don't use the partition number. This will probably work if you're using an external drive:
gptsync /dev/disk1
Once I used gptsync to make a good MBR, the backup drive talked to the PS3 and a Windows box without issues, while still working fine under Linux and OS X. That should cover the main things you need to know for the most common of the uncommon partitioning requirements I expect people to have here.

Getting started with rsync, for the paranoid

When a computer tool has the potential to be dangerous, my paranoia manifests itself by making sure I understand what the tool is doing in detail before I use it. rsync is a very powerful tool you can use to clone directory trees with. It's also possible to wipe out your local files with it, and understanding what it does is quite complicated to figure out. It doesn't help that the rsync manual page is a monster.

The basic tutorials I find in Google all seem a bit off so let me start with why I wrote this. You don't need to start an rsync server to use it, you really don't need or even want to start by setting up unsecure keys, and the tutorials that just focus on the basics leave me not sure sure what I just did. Quick and dirty guide to rsync is the closest to what I'm going to do here, but it lacks the theory and distrust I find essential to keeping myself out of trouble.

Let's start with local rsync, which is how you should get familiar with the tool. One useful mental model here is to think of rsync as a more powerful cp and scp rolled into one initially, then focus on how it differs. The canonical simplest rsync example looks like this:
$ rsync -av source destination

What does this actually do though? To understand that, you first need to unravel the options presented. This takes a while, because they're nested two levels deep! Here's a summary:
-v, --verbose      increase verbosity
-a, --archive archive mode; equals -rlptgoD (no -H,-A,-X)
-r, --recursive recurse into directories

-l, --links copy symlinks as symlinks
-D same as --devices --specials
--devices preserve device files (super-user only)
--specials preserve special files

-t, --times preserve times
-p, --perms preserve permissions
-g, --group preserve group
-o, --owner preserve owner (super-user only)

I've broken these out into the similar groups here. Verbose you're going to want on in most cases, outside of automated operations like inside of cron. The first thing to be aware of with this simple recipe is that turning on archive mode means you're going to get recursive directory traversal. The "-l -D" behavior you probably want in most cases, to properly handle special files and symbolic links. You'll almost always want to preserve the times involved too. But whether you want to preserve the user and group information really depends on the situation. If you're copying to remote system, this might not make any sense at all, which means you can't just use "-a" and will need to decompose the operations here to include all of the remaining ones. In many cases where remote transfer is involved, you'll also want to use "-z" to compress too.

How does rsync make its decisions?

What are the problem spots to be concerned about here, the ones that can eat your data if you're not careful? In order to talk about that, you really need to understand how rsync makes its decisions by default, and its other major modes. Here's the relevant bits from the man page that describe how it decides what files should be transferred; you have to collect the beginning and the details related to a couple of options to figure out the major modes it might run in:
Rsync finds files that need to be transferred using a “quick check” algorithm (by default) that looks for files that have changed in size or in last-modified time. Any changes in the other preserved attributes (as requested by options) are made on the destination file directly when the quick check indicates that the file’s data does not need to be updated.

-I, --ignore-times: Normally rsync will skip any files that are already the same size and have the same modification timestamp. This option turns off this “quick check” behavior, causing all files to be updated.

--size-only: This modifies rsync’s “quick check” algorithm for finding files that need to be transferred, changing it from the default of transferring files with either a changed size or a changed last-modified time to just looking for files that have changed in size. This is useful when starting to use rsync after using another mirroring system which may not preserve timestamps exactly.

-c, --checksum: This changes the way rsync checks if the files have been changed and are in need of a transfer. Without this option, rsync uses a “quick check” that (by default) checks if each file’s size and time of last modification match between the sender and receiver. This option changes this to compare a 128-bit MD4 checksum for each file that has a matching size. Generating the checksums means that both sides will expend a lot of disk I/O reading all the data in the files in the transfer (and this is prior to any reading that will be done to transfer changed files), so this can slow things down significantly...Note that rsync always verifies that each transferred file was correctly reconstructed on the receiving side by checking a whole-file checksum that is generated as the file is transferred, but that automatic after-the-transfer verification has nothing to do with this option’s before the-transfer “Does this file need to be updated?” check.

From this we can assemble the method used on each source file to determine whether to transfer it or not. Once the decision to transfer has been made, the rest of the tests related to that decision are redundant.
  1. Is --ignore-times on? If so, decide to transfer the file.
  2. Do the sizes match? If not, decide to transfer.
  3. Default mode with --size-only off: check the modification times on the file. If the source file is newer, decide to transfer.
  4. Checksum mode: compute remote and local checksums. If they don't match, decide to transfer.
  5. Transfer the file if we decided to above, computing a checksum along the way.
  6. Confirm the transfer checksum matches against the original
  7. Update any attributes we're supposed to manage whether or not the file was transferred.
Understanding rsync's workflow and decision making process is essential if you want to reach the point where you can safely use the really dangerous options like "--delete".

Common problem spots

One thing to be concerned about even in simple cases is that if you if you made a copy of something without preserving the times in the past, the copy will have a later timestamp than the original. This can turn ugly if you're trying to get the local additions to a system back to the original again, as all the copies will look like later ones and you'll transfer way more data than you'd expect. If you know you've just added files on a remote system and don't want to touch the ones that are already there, you can use this option:
  --ignore-existing skip updating files that exist on receiver

This will also keep you from making many classes of horrible errors by not allowing it to overwrite files, so turning it on can be extremely helpful when learning rsync in the first place.

If you're not sure what files have changed but always want to prefer the version on the source node, you can save on network bandwidth here by using the checksum option. That can take a while to scan all of the files involved to compute the checksums, but you'll only transfer the ones that changed even even if the modification times match. Another useful option to know about here is --modify-window, which allows you to add some slack into the timestamp computation, for example if the system clocks involved are a bit low resolution or out of sync.

Using rsync to compare copies

The sophistication of the options here means that you can get rsync to answer questions like "what files have really changed between these two copies?" without actually doing anything. You just need to use one or both of these options:
-n, --dry-run         perform a trial run with no changes made
-i, --itemize-changes output a change-summary for all updates

When learning how to use rsync in the first place, this should be your standard approach anyway: do a dry run with itemized changes, confirm it's doing what you expected, and then fire it off. You'll learn how the whole thing works that way soon enough. Note that if using checksum mode, those will get computed twice this way, but if your files are big enough that this matter you probably should be really paranoid about messing them up too. A rsync dry run with checksums turned on is a great way to get a high level "diff" between two directory trees, either locally or remotely, without touching either one of them.

Other useful parameters to turn on when getting started with rsync are "--stats" and "--progress".

Remote links

Next up are some notes on how the remote links work. If you put a ":" in the name, rsync defaults to using ssh for remote links; again you can think of this as being like scp. Since no admin in their right mind sets up an rsync server nowadays, this is the standard way you're going to want to operate. If you're not using the default ssh port (22), you need to specify it like this:
$ rsync --rsh='ssh -p 12345' source destination

You can abbreviate this to "-e", but I find it makes more sense and is easier to remember using the long version here. You're specifying how it should reach the remote shell here and that's reflected in the long option, the short one just got a random character that wasn't already used.


That covers the basic rsync internals I wanted to know before I used the tool, and that usually get skipped over. The other tricky bit you should know is how directory handling changes based on whether there's a trailing slash on paths, that's covered elsewhere quite well so I'm not going to get into it here.

You should know enough now to use rsync and really understand what it's going to do, as well as how to be paranoid about using it. Don't overwrite things unless you know it's safe, always use a dry run for a new candidate rsync command, and break down the options you use to the subset you need if the big options collections like "-a" do more than that.

Where to go from here? In order of increasing knowledge requirements I'd suggest these three links:
  1. rsync Tips & Tricks: This gives some more detail about some of the options you should know about I skimped on, and covers a lot of odd situations too.
  2. Backups using rsync: Great description of how many of the more obscure parameters actually work. This will suggest what underdocumented parameters like the deletion ones actually do, and suggest how you could use some of them.
  3. Easy Automated Snapshot-Style Backups with Linux and Rsync: The gold mine guide of advanced techniques here. Once past the basics, it's easy to justify studying this for as long as it takes to understand how the whole thing works, as you'll learn a ton about how powerful rsync and how powerful rsync can be along the way.

Using doxypy for Python code documentation

Last time I wrote a long discussion about Python module documentation that led me toward using doxypy feeding into doxygen to produce my docs. Since I don't expect Python programmers in particular to be familiar with doxygen, a simple tutorial for how to get started doing that seemed appropriate. I had to document this all for myself anyway.

Running on Ubuntu, here's what I did to get the basics installed (less interesting bits clipped here):
$ sudo apt-get install doxygen
$ cd $HOME
$ wget
$ tar xvfz doxypy-0.4.1.tar.gz
$ sudo python install
running install
running build
running build_scripts
running install_scripts
creating /usr/local/local
creating /usr/local/local/bin
copying build/scripts-2.6/ -> /usr/local/local/bin
changing mode of /usr/local/local/bin/ to 755
running install_egg_info
Creating /usr/local/local/lib/python2.6/dist-packages/
Writing /usr/local/local/lib/python2.6/dist-packages/doxypy-0.4.1.egg-info

That last part is clearly wrong. The code that ships with doxypy is putting "/usr/local" where "/usr" should go, which results in everything going into "/usr/local/local". That needs to get fixed at some point (update: as of doxypy-0.4.2.tar.gz 2009-10-14, this bug is gone), for now I was content to just move things where they were supposed to go to work around it and cleanup the mess:
$ sudo mv /usr/local/local/bin/ /usr/local/bin
$ sudo mv /usr/local/local/lib/python2.6/dist-packages/doxypy-0.4.1.egg-info /usr/local/lib/python2.6/dist-packages/
$ sudo rmdir /usr/local/local/bin
$ sudo rmdir /usr/local/local/lib/python2.6/dist-packages/
$ sudo rmdir /usr/local/local/lib/python2.6/
$ sudo rmdir /usr/local/
$ sudo rmdir /usr/local/local/lib/
$ sudo rmdir /usr/local/local
And, yes, I am so paranoid about running "rm -rf" anywhere that I deleted the directories one at a time instead of letting recursive rm loose on /usr/local/local instead. You laugh, but I've watched a badly written shell script wipe out a terabyte of data not being careful with rm.

Now we need a sample project work on. Here's a tired old example I've updated with a first guess at markup that works here:
$ cd <my project>
#!/usr/bin/env python
"""@package fib
Compute the first ten numbers in the Fibonacci sequence

def fib(n):
Return a Fibonacci number

@param n Number in the sequence to return
@retval The nth Fibonacci number
if n>1:
return fib(n-1)+fib(n-2)
if n==0 or n==1:
return 1

if __name__ == '__main__':
for i in range(1,10):
print fib(i)

Now we ask doxygen to create a template configuration file for us, edit it to add some lines to the end, and run it:
$ doxygen -g
$ $EDITOR Doxyfile

(add these lines to the end and comment them
out where they appear earlier)

# Customizations for
INPUT_FILTER = "python /usr/local/bin/"
INPUT = ""

$ doxygen

The basics of how to get Doxygen going are documented in its starting guide. For a non-trivial program, you'll probably want to make INPUT more general and expand FILE_PATTERNS (which doesn't even do anything the way INPUT is setup here). As hinted above, I'd suggest commenting out all of the lines in the file where the parameter's we're touching above originally appear, along with adding a block like this to the end of it with your local changes. That's easier to manage than touching all of the values where they show up in the template.

Now fire up a web browser and take a look at what comes out in the html/ directory. You have to drill down into "Namespaces" or "Files" to find things in this simple example.

Function Documentation
def fib::fib ( n )

Return a Fibonacci number.

n Number in the sequence to return

Return values:
The nth Fibonacci number
There's a few more things I need to do here before I'll be happy enough with this to use it everywhere:
While there's plenty left to learn and some loose ends, so far I'm happy enough with this simple proof of concept to keep going in this direction.

Monday, October 12, 2009

Watching a hard drive die

One thing I get asked all the time is how to distinguish between a hard drive that is physically going bad and one that is just not working right from a software perspective. This week I had a drive fail mysteriously and saved the session where I figured out what went wrong to show what I do. It's easy enough to find people suggesting "monitor 'x'" for your drive, where 'x' varies a bit depending on who you ask. Writing scripts to do that sort of thing is easier if you've seen how a bad one acts, which (unless you're as lucky as me) you can't just see on demand easily. This is part one to a short series I'm going to run here about hard drive selection, which will ultimately lead to the popular "SATA or SAS for my database server?" question. To really appreciate the answer to that question, you need to start at the bottom first, with how errors play out on your computer.

Our horror story begins on a dark and stormy night (seriously!). I'm trying to retire my previous home server, an old Windows box, and migrate the remainder of its large data files (music, video, the usual) to the new Ubuntu server I live on most of the time now. I fire up the 380GB copy on the Windows system a few hours before going to bed, using cygwin's "cp -arv" so I won't get "are you sure?" confirmations stopping things. I expected it will be finished in the morning. I check up on it later, and the copy isn't even running anymore. Curious what happened, I ran "du -skh /cygdrive/u/destination" to figure out how much it did copy before dying. In the middle of that, the drive starts making odd noises, and the whole system reboots without warning. This reminds me why I'm trying to get everything out of Windows 2000.

At this point, what I want to do is look at the SMART data for the drive. The first hurdle is that I can't see that when the disk is connected via USB. A typical USB (and Firewire) enclosure bridge chipset doesn't pass through requests for SMART data to the underlying drive. So when things go wrong, you're operating blind. Luckily, this enclosure also has an eSATA connector, so I can connect it directly to the Linux PC to see what's going on. That connection method won't have the usual external drive limitations. If that weren't available, I'd have to pull the drive out of its enclosure and connect directly to a Linux system (or another OS with the tools I use) to figure out what's going on.

(If you don't normally run Linux, you can install smartmontools on Windows and other operating systems. Just don't expect any sort of GUI interface. Another option is to book a Linux live CD; I like Ubuntu's for general purpose Linux work, but often instead use the SystemRescueCd for diagnosing and sometimes repairing PC systems that are acting funny.)

Plugged the drive into my desktop Linux system, "tail /var/log/messages" to figure out what device it gets assigned (/dev/sdg), and now I'm ready to start. First I grab the drive's error log to see if the glitch was at the drive hardware level or not:
$ sudo smartctl -l error /dev/sdg
smartctl version 5.38 [x86_64-unknown-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Error 2 occurred at disk power-on lifetime: 153 hours (6 days + 9 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
-- -- -- -- -- -- --
84 51 00 00 00 00 a0

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
ec 00 00 00 00 00 a0 08 00:00:17.300 IDENTIFY DEVICE

Error 1 occurred at disk power-on lifetime: 154 hours (6 days + 10 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
-- -- -- -- -- -- --
40 51 01 01 00 00 a0 Error: UNC 1 sectors at LBA = 0x00000001 = 1

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 03 01 01 00 00 a0 ff 11:17:17.000 READ DMA EXT
25 03 01 01 00 00 a0 ff 11:17:17.000 READ DMA EXT
25 03 30 5e 00 d4 48 04 11:17:05.800 READ DMA EXT
25 03 40 4f 00 d4 40 00 11:16:56.600 READ DMA EXT
35 03 08 4f 00 9c 40 00 11:16:56.600 WRITE DMA EXT

Getting DMA read/write errors can be caused by driver or motherboard issues, but failing to idenitify the device isn't good. Is the drive still healthy? By a rough test, sure:
$ sudo smartctl -H /dev/sdg
SMART overall-health self-assessment test result: PASSED

This is kind of deceptive though, as we'll see here. The next thing we want to know is how old the drive is right now and how many reallocated sectors are there. Those are the usual first warning sign that a drive is losing small amounts of data. We can grab just about everything from the drive like this (bits clipped from the output in all these examples to focus on the relevant parts):
$ sudo smartctl -a /dev/sdg
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
1 Raw_Read_Error_Rate 0x000b 092 092 016 Pre-fail Always - 2621443
2 Throughput_Performance 0x0005 100 100 050 Pre-fail Offline - 0
3 Spin_Up_Time 0x0007 111 111 024 Pre-fail Always - 600 (Average 660)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 74
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 7
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0
8 Seek_Time_Performance 0x0005 100 100 020 Pre-fail Offline - 0
9 Power_On_Hours 0x0012 100 100 000 Old_age Always - 154
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 69
192 Power-Off_Retract_Count 0x0032 100 100 050 Old_age Always - 77
193 Load_Cycle_Count 0x0012 100 100 050 Old_age Always - 77
194 Temperature_Celsius 0x0002 114 114 000 Old_age Always - 48 (Lifetime Min/Max 18/56)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 12
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 4
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x000a 200 253 000 Old_age Always - 4

Make sure to scroll this to the right, the last column is the most important. UDMA_CRC_Error_Count matches the errors we're still seeing individually. But the real smoking gun here, and in many other cases you'll see if you watch enough drive die, is Reallocated_Sector_Ct (7) and its brother Reallocated_Event_Count (12). Ignore all the value/worst/thresh nonsense; that data is normalized by this weird method that doesn't make any sense to me. The "raw_value" is what you want. On a healthy drive, there will be zero reallocated sectors. Generally, once you see even a single one, the drive is on its way out. This is always attribute #5, I like to monitor #194 (temperature) too because that's a good way to detect when a system fan has died. The drive overheating can be a secondary monitor for that very bad condition. You can even detect server room cooling failures that way, it's fun watching a stack of servers all kick out temperature warnings at the same time after the circuit the AC is on blows.

The other thing to note here is Power_On_hours. Here the raw value (154 hours) confirms that the recent errors in the logs did just happen. This is a backup drive I only power on to copy files to and from it, and it's dissapointing that it's died with so little life. Why that happened and how to prevent it is another topic.

Next thing to do is to run a short self-test, wait a little bit, and check the results. This drive is loud enough that I can hear when the test is running, and it doesn't take long:
$ sudo smartctl -t short /dev/sdg
$ sudo smartctl -l selftest /dev/sdg
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed: read failure 60% 153 717666530

Here's the thing to realize: the drive reports "healthy" to the simple test. But isn't; there are reallocated sectors, and even the simplest of self-checks will find them. Enterprise RAID controllers can be configured to do a variety of "scrubbing" activities when the drives aren't being used heavily, and this is why they do that: early errors can get caught by the drive long before you'll notice them any other way. Nowadays drives will reallocate marginal sectors without even reporting an error to you, so unless you look at this data yourself you'll never know when the drive has started to go bad.

At this point I ran a second short test, then an extended one; here's the log afterwards:
$ sudo smartctl -t short /dev/sdg
$ sudo smartctl -t long /dev/sdg
$ sudo smartctl -l selftest /dev/sdg

Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 90% 157 2454905
# 2 Short offline Completed: read failure 40% 156 711374978
# 3 Short offline Completed: read failure 60% 153 717666530

The extended test found an error even earlier on the disk. It seems pretty clear this drive is failing quickly. At this point, there's little hope for it beyond saving any data you can (which I did before even launching into this investigation) and moving toward running the manufacturer's diagnostic software. What I'd expect here is that the drive *should* get marked for RMA if it's already in this condition. It's possible it will "fix" it instead. That's a later topic here.

In short, there are a few conclusions you can reach yourself here, and since this is a quite typical failure I can assure you these work:

  • Try not to connect external drives to a Windows server if you can avoid it, as they're pretty error prone and Windows isn't great at recovering from this sort of error. This fact is one reason I get so many of these requests to help distinguish true hardware errors from Windows problems.

  • Drives that are setup such that they can't check themselves via SMART are much more likely to quietly fail. Had this one been connected directly rather than via USB, I could have picked up this problem when the reallocated sector count was lower and decreased my risk of data loss.

  • If you're taking an offline drive out of storage to use it again, a short SMART check and look at the logs afterwards is a good preventative measure. Some drives even support a "conveyance" self-test aimed for checking quality after shipping, this one didn't so I went right from the short to long tests.

  • When monitoring regularly with smartmontools, you must monitor reallocated sectors yourself, the health check will not include them in its calculations. They are an excellent early warning system for errors that will eventually lead to data loss.

  • Regularly running the short SMART check doesn't introduce that high of a disk load, and it is extremely good at finding errors early too. I highly recommend putting a periodic SMART self-test on your server outside of peak hours if you can, if you're not using some other form of data consistency scrubbing

  • Running a baseline SMART self-test when you first put a drive into service helps provide a baseline showing good performance before things go wrong. I didn't do that in this case and I wish I had that data for comparison. It's helpful for general drive burn-in too.

That's the basics of what an error looks like when you catch it before the drive stops responding altogether. I've found this is quite often the case. Anecdotally, 75% of the drive failures I've seen in the last seven years (3/4 failures since when I started doing this) show up like this before the drive stops responding altogether. Some of the larger drive studies floating around recently suggest it's not quite that accurate as an early warning for most, it's certainly much better than not checking for errors at all.

The fact that this drive died somewhat mysteriously, in a way that it even still passed its SMART health check, has some interesting implications for its suitability in a RAID configuration. That's where I'm heading with this eventually.

Whatever OS you're running, you should try to get smartmontools (or a slicker application that does the same thing) running and setup to e-mail you when it hits an error. That regime has saved my data on multiple occasions.

Wednesday, October 7, 2009

Writing monitoring threads in Python

A common idiom in programs I write is the monitoring thread. If you have a program doing something interesting, I often want to watch consumption of some resource in the background (memory, CPU, or app internals) while it runs. Rather than worrying the main event loop with those details, instead I like to fire off a process/thread to handle that job. When the main program is done with its main execution, it asks the thread to end, then grabs a report. If you write a reusable monitoring library like this, you can then just add monitoring thread for whatever you want to watch within a program with a couple of lines of code.

Threading is pretty easy in Python, and the Event class is an easy way to handle sending the "main program is exiting, give me a report" message to the monitoring thread. When I sat down to code such a thing, I found myself with a couple of questions about exactly how Python threads die. Some samples:
  • Once a thread's run loop has exited, can you still execute reporting methods against it?
  • If you are sending the exit message to the thread via a regular class method, can that method safely call the inherited thread.join and then report the results itself only after the run() loop has processed everything?
Here's a program that shows the basic outline of a Python monitoring thread implementation, with the only thing it monitors right now being how many times it ran:
#!/usr/bin/env python

from threading import Thread
from threading import Event
from time import sleep

class thread_test(Thread):

def __init__ (self,nap_time):

def exit(self,wait_for_exit=False):
print "Thread asked to exit, messaging run"
if wait_for_exit:
print "Thread exit about to wait for run to finish"

def run(self):
while not self.exit_event.isSet():
print "Thread running iteration",self.times_ran
print "Thread run received exit event"

def report(self):
if self.is_alive():
return "Status: I'm still alive"
return "Status: I'm dead after running %d times" % self.times_ran

def tester(wait=False):
print "Starting test; wait for exit:",wait
print # Still alive here
print "Main about to ask thread to exit"
print "Exit call report:",e
print # Thread is certainly done by now

if __name__ == '__main__':
Whether or not to call the thread's "join" method from the method that requests it to end is optional, so we can see both behaviors. Here's what the output looks like:
Starting test; wait for exit: False
Thread running iteration 1
Thread running iteration 2
Thread running iteration 3
Status: I'm still alive
Thread running iteration 4
Thread running iteration 5
Main about to ask thread to exit
Thread asked to exit, messaging run
Exit call report: Status: I'm still alive
Thread run received exit event
Status: I'm dead after running 5 times

Starting test; wait for exit: True
Thread running iteration 1
Thread running iteration 2
Thread running iteration 3
Status: I'm still alive
Thread running iteration 4
Thread running iteration 5
Main about to ask thread to exit
Thread asked to exit, messaging run
Thread exit about to wait for run to finish
Thread run received exit event
Exit call report: Status: I'm dead after running 5 times
Status: I'm dead after running 5 times
That confirms things work as I'd hoped. That is usually the case in Python (and why I prefer it to Perl, which I can't seem to get good at predicting). I wanted to see it operate to make sure my mental model matches what actually happens though.

  1. If you've stashed some state information into a thread, you can still grab it and run other thread methods after the thread's run() loop has exited.
  2. You can call a thread's join method from a method that messages the run() loop and have it block until the run() loop has exited, that works. This means the method that stops things can be setup to return only complete output directly to the caller requesting the exit.
With that established, I'll leave you with the shell of a monitoring class that includes a small unit test showing how to use it. Same basic program, but without all the speculative coding and print logging in the way, so it's easy for you to copy and run with to build your own monitoring routines. The idea is that you create one of these, it immediately starts, and it keeps going until you ask it to stop doing whatever you want in the background--at which point it returns its results (and you can always grab them later too).
#!/usr/bin/env python

from threading import Thread
from threading import Event
from time import sleep

class monitor(Thread):

def __init__ (self,interval):

def exit(self):

def run(self):
while not self.exit_event.isSet():

def report(self):
if self.is_alive():
return "Still running, report not ready yet"
return "Dead after running %d times" % self.times_ran

def self_test():
print "Starting monitor thread"
print "Sleeping..."
print "Exit call report:",e

if __name__ == '__main__':
The main thing you might want to improve on here for non-trivial monitoring implementations is that the interval here will vary based on how long the monitoring task takes. If you're doing some intensive processing that takes a variable amount of time to happen at each interval, you might want to modify this so that the sleep time is adjusted so to aim for a regular target time, rather than to just sleep the same amount every time.

(All the above code is made available under the CC0 "No Rights Reserved" license and can be incorporated in your own work without attribution)

Tuesday, October 6, 2009

Formatting source code and other text for blogger

The biggest nemesis of this blog is that I regularly include everything from source code to log files in here, which really do not fit well into Blogger without some help. Today I got fed up with this enough to look for better ways than what I had been doing.

My HTML skills are still mired in cutting-edge 1995 design, I lost touch somewhere around CSS, so my earlier blog entries used this bit of HTML to insert text I didn't want the blogger formatting to touch as the quickest hack I found that worked:

<div style="padding: 4px; overflow: auto; width: 400px; height: 100px; font-size: 12px; text-align: left;"><pre>
Some text goes here

That looks the way things formatted that way will look, except with only the inner scroll bar, and getting that posted turned quite self-referential.

Two things were painful about this. The first is that I had to include this boilerplate formatting stuff every time, which required lots of cut and paste. The second is that I had to manually adjust the height every time, and the heights didn't match between the preview and the actual post. I think I did that on purpose at one point, so that I could display a long bit of source code without having to show the whole thing. In general, this is a bad idea though, and you instead want to use "width: 100%" and leave out the height altogether.

What are the other options? Well, you could turn that formatting into a proper "pre" style entry which cuts down on the work there considerably, and is much easier to update across the whole blog. Then you just wrap things with the pre/code combo and you're off, which is a bit easier to deal with. There's an example of this at Blogger Source Code Formatter that even includes a GreaseMonkey script to help automate wrapping the text with what you need. Another example of adjusting there is at How to show HTML/java codes in blogger.

You probably want to save a copy of everything before you tinker and track your changes; the instructions at Can I edit the HTML of my blog's layout? covers this. I put my template into a software version control tool so I can track change I make and merge them into future templates; I'm kind of paranoid though so don't presume you have to do that. I settled on the "Simple II" theme from Jason Sutter as being the one most amenable as a base for a programming oriented blog, as it provides the most horizontal space for writing wide lines. I'd suggest considering a switch to that one before you customize your template, then tweak from there.

The main problem left to consider here, particularly when pasting source code, is that you need to escape HTML characters. I found two examples of "web services" that do that for you, including producing a useful header, that are minimally useful. I like the UI and output of Format My Source Code For Blogging better than Source Code Formatter for Blogger, but both are completely usable, and the latter includes the notion that you might want to limit the height on long samples. I think in most cases you'd want to combine using one of them with the approach of saving the style information into your template advocated by the GreaseMonkey-based site, just using the code and its wrapper from these tools in a typical case rather than using a one-off style every time. If you do that, you can just wrap things in a simple pre/code entries and possibly use something as simple as Quick Escape just to fix the worst things to be concerned about.

Here's what I got from the simpler tool I mentioned first:
<pre style="font-family: Andale Mono, Lucida Console, Monaco, fixed, monospace; color: #000000; background-color: #eee;font-size: 12px;border: 1px dashed #999999;line-height: 14px;padding: 5px; overflow: auto; width: 100%"><code>
Some text goes here

That's a bit more reasonable to work with, looks better (I favor simple over fancy but like something to make the code stand apart), and it easy to dump into my template for easy use (and changes) in the future.

After considering all the samples available, here's the config I ended up dumping into my own Blogger HTML template, after switching themes. This goes right before "]]></b:skin>" in the template:
font-family: Andale Mono, Lucida Console, Monaco, fixed, monospace;
color: #000000;
line-height: 100%;
overflow: auto;
width: 100%;
padding: 5px;
border: 1px solid #999999;

color: #000000;
That's a bit better to my eye, the dashes looked bad. Code is easier to follow too.

Now, what if you want real syntax highlighting for source code? Here the industrial strength solution is SyntaxHighlighter. There's a decent intro to using that approach at Getting code formatting with syntax highlighting to work on blogger. The one part I'm not comfortable with there is linking directly to the style sheets and Javascript code to the trunk of the SyntaxHighlighter repo. That's asking for your page to break when the destination moves (which has already happened) or someone checks a bad change into trunk. And that's not even considering the security nightmare if someone hostile takes over that location (less likely when it was on Google Code, I'm not quite as confident in the ability of to avoid hijacking). You really should try to find a place you have better control over to host known stable copies of that code at instead.

I may publish a more polished version of what I end up settling on at some point, wanted to document what I found initially before I forgot the details.

Tuesday, September 29, 2009

Module API documentation in Python

Sometimes I fondly reminisce about the days when all of the code I worked on was in one programming language. Nowadays, it's a mix of C (mainly related to the PostgreSQL code base), Java (my employer's middleware and lot of my personal code), and Python (systems programming, general utilities, and QA test code). Python is the most recent of those to be added to the mix, and it's proven to have its own unique code documentation challenges, some of which have clarified how to deal with the other languages in the process.

First I should label my expectations here. I'm not a big fan of dynamic typing to begin with, and I'd at least like to document what type each parameter all of the code I intend to be reusable expects, even if those restrictions aren't enforced at compile time. Both C and Java require specifying types for every parameter, and Java includes its Javadoc mechanism for labeling the parameters with their intended purpose and function. That's all I really want: feed in a bit of source code that includes some markup for what all the parameters mean, along with general text commentary; get HTML/PDF output that documents the API presented by that code.

One thing I've found very disappointing about Python is that that its development community seems to actively reject the idea of good parameter documentation directly in the source code. The closest thing I've seen is the PEP for Function Annotations, which are so barebones I wouldn't consider them a help even if they were more mainstream (they're not yet). All we really get for in-code documentation are the Docstring Conventions and pydoc, which don't provide any standard way to label parameters in a way more complicated browsing or analysis tools can utilize.

The first tool I considered for this purpose is Epydoc. This understands Javadoc formatted docstring and ReST, which are two standards I already code documentation using. This includes its own somewhat odd variable docstring syntax, which I didn't find very useful. A similar tool that knows much more about subclassing is pydoctor, whose introduction mentions a bunch of other projects in this area neither I nor them were impressed by.

Another Python specific tool here is pythondoc. My first problem with that project are that it seems kind of dead. Ultimately, my bigger concern is that I'd like to use Python docstrings as much as possible, just with additional markup inside them. pythondoc seems to prefer # formatted comments which aren't really acceptable here.

I keep circling back to Javadoc markup as the only reasonable one here. Ultimately, if I'm using Javadoc format, with nothing Python specific, I have to ask myself why I should adopt a one-off tool such as Epydoc, if instead I can get one that supports the other languages I use and provides a wider feature set. To see the perils of that approach, check out the train wreck answer to the FAQ how to print Javadoc to PDF. What a disaster. To work around that Javadoc limitation, I'd already started moving toward using Doxygen, which I know works great on the C code I browse most via the PostgreSQL code base. (Arguing the merits of doxygen vs. javadoc just in a Java context is a popular topic; see Javadoc or Doxygen? and Doxygen Versus Javadoc for two examples)

A quick check of the full Comparison of documentation generators page didn't give other tools that looked like they would help here. At this point I started to settle on a tentative approach that would unify my work with one tool to use: doxygen + Javadoc formatted parameters in a docstring I could live with. One problem: if you use the Python standard docstring approach, doxygen's Python support won't allow any special commands in there. That's pretty much useless.

Luckily I'm not the first person to make that leap: doxypy is a filter that takes regular Python code with the usual docstring format in, producing an intermediate file in the format doxygen wants to work with. But where's the examples of how it works to get people started?

Luckily, like all good software the authors eat their own dogfood, and the filter itself is a Python program documented so that doxypy can process it. Here's a simple example of a method call from inside it:

def makeTransition(self, input):
""" Makes a transition based on the given input.

@param input input to parse by the FSM

In this case FSM means "finite-state machine" and not my deity of choice.

Something this simple was all I was looking for, and the only open point here is that Javadoc format presumes one can divine the type from the declaration; that's not so clear here.

Wednesday, September 16, 2009

Following symlinks in Python

Today's Python trivia question: you have the path of a symbolic link. How do you get the full destination that link points to? If your answer is "use os.readlink", well it's not quite that easy. I'm not alone in finding the docs here confusing when they say: "the result may be either an absolute or relative pathname" and then only tell you how to interpret the result if it's relative. This guy wonders the same thing I did, which is how to know whether the returned value is a relative or absolute path?

I found a clue as to the way to handle both cases in the PathModule code, which is that you use os.path.isabs on the result to figure out what you got back. That module is a lot of baggage to pull in if you just want to correct this one issue, though, so here's a simpler function that knows how to handle both cases:

def readlinkabs(l):
Return an absolute path for the destination
of a symlink
assert (os.path.islink(l))
p = os.readlink(l)
if os.path.isabs(p):
return p
return os.path.join(os.path.dirname(l), p)

I hope my search engine cred bubbles me up so someone else trying to look this up like I did doesn't have to bother reinventing this particular tiny wheel.