Sunday, November 30, 2008

Automating initial postgresql.conf tuning

About two years ago I returned to full-time work on PostgreSQL performance tuning for a contract job, after having done some earlier work in that area for a personal project. One of the things that struck me as odd was that no progress had been made in the intervening years toward providing an automatic tool to help with that goal. Since many of the existing guides only covered older versions of PostgreSQL (the 8.1 release changed what you can adjust usefully quite a bit), even the few template suggestions for various machine sizes floating around the web were largely obsolete--not to mention that the modern hardware was a lot more capable. When I tried writing an automatic tool myself, I started to realize just how much of a harder problem it was than it seemed.

One stumbling block is that you need a fair amount of information about the parameters in the postgresql.conf file, and some of that information wasn't all available any way except to read the associated source code. There's enough information embedded only there that it was impractical to expect a potential tool writing author to extract and maintain their own copy of the settings database. For example, I had to submit a patch myself earlier this year to make it possible to easily discover what value a parameter you don't set will get. Now you can look at the pg_settings view and that shows up as "boot_val".

A second problem is that coming up with a robust model for what to set everything to was harder than it seems at first. Earlier this year I worked with some other community members on codifying some of that into tuning guidelines, and there was a somewhat arcane tuning spreadsheet from Josh Berkus floating around. After showing I was close to an initial solution to this long outstanding TODO item, Josh came to my aid with a simplified model that was easy to implement as code.

There have been more than one attempt to write this kind of tool that fizzled after some initial work. The other classic trap people have fallen into here is worrying about the user interface before getting all the boring parts done. It takes a fair amount of code to read a postgresql.conf file usefully, learn the rules for what can you do with it (again, some of which are only documented in the database source code), and publish an updated file. As an example there, while it's possible to just maintain a list of which parameters are specified using "memory units" and what unit they default to (some are multiples of a single kB, others in 8kB blocks), what you really should do is look at the "unit" column that comes out of pg_settings instead to figure that out. And to do that, you need to save a copy of pg_settings into a file (since at the point this tool is being run the server isn't up yet) and load it into memory such that the related code can rely on it. Then it's trivial for the tool to distinguish between a value like "max_connections" (which is a simple integer that accepts no additional units) and "work_mem", which is specified in 1kB units but you can use 'MB' and 'GB' suffixes for. Oh, and don't forget the upper limit for that particular setting differs depending on whether you're on a 32 or 64 bit platform. What fun.

One of the guidelines I try to live by is "don't complain about anything you're not willing to help fix yourself", and I've done my share of complaining on this topic. This past holiday weekend, in between rounds of turkey overdose I reached a new milestone here: my post to the pg hackers mailing list yesterday includes a Python implementation that solves all the traditional problems besetting tuning tools authors in the context of the upcoming PostgreSQL 8.4. You feed it an existing postgresql.conf file and tell it what kind of database application you expect to build (data warehouse, web app, etc.). It tries to figure out how much RAM you have, then produces an updated postgresql.conf file for you with updates to 10 critical parameters.

Since the default configuration file for PostgreSQL is aimed at working with the conservative parameters many operating systems ship with, that makes those defaults only appropriate for a system having approximately 128MB of total RAM. If you have more than that and haven't considering performance tuning improvements for your PostgreSQL database, this tool, the model expressed in its source code, or a future version of it (once 8.4 is done I plan a version backward compatible with versions 8.1-8.3) may eventually be useful to you.

And if you're one of the frustrated potential tool authors who was stopped by one of the problems I mentioned here, consider this code release as an initial resource you can use. Even if it's not your preferred language, Python is pretty easy to read. I think it's easier to start by using this program as a spec to reference and emulate than it is to dig into the source code and resources I did to write it.

8 comments:

Denish Patel said...

I could not able to download the pgtune file from http://archives.postgresql.org/pgsql-hackers/2008-11/msg00919.php

would you please send it to me by email: denish.j.patel@hotmail.com ?

Thanks in advance,
Denish

Uqbar said...

... or better fix that link for everyone?

Greg Smith said...

That's not the link that leads to the article--it's the very last one there, "post to the pg hackers mailing list".

Denish Patel said...

Yeah, It is navigating to the mailing list thread. But the problem is when i try to download the file (pgtune-v3.tar.gz)from the thread it is downloading in binary format. Would you please upload the file in this blog entry? please. thanks in advance.

Greg Smith said...

It is a binary file; it's an archive of a couple of files, compressed. Will be no easier to download if I put it here. If you're on something like Windows, you'll need a program such as Winzip to read the file. You'll also need to run Python manually from the directory the files all extract to; I'm still working on instructions for that.

This is still rather rough for regular users to be experimenting with, that's what the official software release is for. I was announcing the early availability more for developers who might be interested in the project at this point but who aren't on the PostgreSQL hackers list themselves (it's pretty high traffic for anyone who doesn't directly work on the PostgreSQL code).

mla said...

Just a vote of support on this.

I'm building a new db server and starting googling around for the latest recommendations.

This is a great help. If you need any testers, let me know.

Rohit said...

Firstly this script is a great idea for people who don't have much experience with optimising postgres.

I tried the pgtune script on osx. Got a few errors.

1) "from ctypes.wintypes import *" causes exception.

2) totalMem() method returns None

3)I think options.totalMemory returns a str. So totalMemory / kb fails.

Hope this helps.

alci said...

Just came accross you tool, and I wanted to give you a big thank you... you made my day !
Thanks a lot.