I should have stopped there.
For kicks, I decided to try the latest kernel at the time, 2.6.24.3, to see if it was any faster. The big change in recent kernels I was curious about was the introduction of the controversial Completely Fair Scheduler (CFS). Because that was merged into the kernel quite a bit faster than I find comfortable, I was rather suspicious of its performance, and I launched into a full set of tests using pgbench.
These went very badly, with an extreme drop in performance compared with earlier versions. By the time I quantified that and reported to PostgreSQL land, I was told Nick Piggin had already reported the problem and a fix was due for 2.6.25, which literally came out in the middle of the night the day was I coming up with a repeatable test case for 2.6.24.
2.6.25 gave the worst pgbench results yet. I did confirm that it's not in the server though; if I run the pgbench client on a remote system, the results scale fine to the level I expect.
It took me some time to organize all my test results and automate my test cases well enough that I thought they could be replicated. This week I submitted a kernel regression report about this problem. Mike Galbraith was easily able to reproduce it, and I've already gotten and tested one patch to improve things.
Still a ways back from 2.6.22 though, and with 2.6.26 being on -rc3 I have my doubts this will be fully wrapped up before that goes live. So here's my suggestions for those of you using PostgreSQL on recent Linux versions who care about performance:
- Kernel 2.6.18 has been a fairly solid kernel for me under RHEL5 and the matching CentOS.
- Kernel 2.6.19 was the first to merge the libata drivers and I avoid it because of that.
- Kernel 2.6.20 in the form of Ubuntu 7.04 Feisty Fawn seems to have stabilized after the libata merge. I can't comment on the kernel.org distribution.
- Kernel 2.6.22 seems solid on everything I throw at it, this is the latest Linux kernel I recommend using for PostgreSQL without serious caveats. The first 2.6.22 came out in July of last year, so it's almost a year old now, and it's currently at 2.6.22-19. Ubuntu's 7.10 Gutsy Gibbon includes 2.6.22.
- Kernels 2.6.23 and 2.6.24, the first to use CFS, have known serious server-side PostgreSQL performance issues. The main distributions I'm aware of that are likely impacted here are Fedora Core 8 and Ubuntu 8.04. I expect Fedora to be running bleeding edge kernels with regressions (you could argue the existing of Fedora has pushed kernel testing toward the users rather than being done as much by the developers). I'm really disappointed Ubuntu adopted such a kernel given the rawness of CFS. Already ranted about this a bunch on my last blog post. I think this is just a timing problem for them. Had they adopted 2.6.22 instead, they'd be facing 5 years of LTS with a kernel using a model abandoned by the mainstream development and therefore have nowhere to turn when issues popped up. The right answer would be not to do a LTS right now, but I digress.
- Kernel 2.6.25 seems to have resolved the worst of the server-side issues. This is what Fedora Core 9 is running. But results from pgbench should be considered suspect, particularly under high client loads, and without that working I can't really prove to myself that 2.6.25 is fine in all the situations I like to test.
- Kernel 2.6.26 may get some fixes in to improve pgbench, depends on how things go. The patch I've already gotten closed much of the performance gap with only a few lines of code changed. You can follow my thread on lkml to see how things are going.
The really important lesson here I drill into people whenever I talk about performance and benchmarking issues is that you've got to measure baseline expectations on any system you put together, continue to check periodically to make sure things haven't eroded, and use that as guidance for any new system. Here I compared new hardware about to go into service against my older, trusted "production" unit to confirm it was faster, discovering both the disk and the pgbench issues. It's really handy to know how to quantify how fast your reference benchmark is on a system from a CPU/memory/disk perspective when you get to where you're looking to upgrade it. Not only will that help guide what should be changed, but it will let you know if the upgrade really worked or not when you're done.