- Why you should always run your own hardware benchmarks on every piece of hardware you can
- Examples of the simplest benchmarks I've found to be accurate
- How do organize your tests and your vendor interactions to support performance measurement as a purchasing requirement
Also available on my web page now is a presentation I did last month at PG East 2009. Titled "Using and Abusing pgbench", that talk also has 3 things it tries to convey:
- How does pgbench and its internal scripting language work? (Most people aren't even aware there is such a scripting language available)
- What should you do in order to get good results from the built-in pgbench tests?
- How can you use pgbench as a test harness for writing your own tests?
As part of putting that presentation together, I did more work on a toolchain I've been using for a couple of years now (since I was working on 8.3 development) I've named pgbench-tools. The current 0.4 release posted to my home page is the first to benefit from having some users, which has gotten me an enormous amount of feedback toward making the program bug-free and more usable. Thanks in particular to Robert Treat and Jignesh Shah for their contributions. I think it's finally mature enough that it might be useful for others who want to automate running large numbers of pgbench tests too.
Documentation is still minimal, but I have written some (and what's there is accurate, both of which put me ahead of a lot of open-source projects I guess). There is an into README in the tar file and the presentation tries to give some examples of usage too. When I get more time I'm putting the source code into the PostgreSQL git repository (the repo is already there, I just haven't pushed to it yet), where it will be easier for other people to work with and on. There's a growing need in the PG community for regression testing of performance results, and at the yearly PGCon Developer Meeting I volunteered to see if an improved version of this pgbench-tools package might be useful in that role. I hope the ideas in my presentations and the suggested practice demonstrated by these tools turns out to be helpful to others.
The approach taken in pgbench-tools, that you should parse results from pgbench, save them to a database, and then graph the lot of them using SQL to summarize as needed, is only partially mine. I stole the first rev of the graphing code and several other ideas from the work Mark Wong and others did on the dbt2 program (here's an intro to using dbt2). Now that I've got something useful for my purposes and am free from conferences for a while, I'm hoping to spend some time investigating how to integrate the unique things I'm doing with some of the tools he's already written. The biggest thing the dbt tests have that I haven't provided for pgbench yet is a framework for measuring I/O and similar statistics during the test run. Given that the PostgreSQL development process already has a heavy requirement on Perl, I really should fall into line and adopt that myself too--despite my strong personal preference for Python in this role.