Wednesday, July 22, 2009

Upgrading Flex from source RPM to compile PostgreSQL from CVS

This week I'm working on reviewing a patch that's part of the current PostgreSQL CommitFest, the periodic points where all outstanding patches are looked at and committed if ready. The patch I'm looking at requires some performance testing, and all my servers I'd do that on are running CentOS 5, the popular RedHat Enterprise clone. There's a fun surprise waiting for anyone else who tries this: as of last week, you can't build PostgreSQL from the development CVS or GIT repositories anymore on that platform without updating the Flex package from the default one.

The problem started when I ran "make" to build everything and saw this:


make[3]: Entering directory `/home/gsmith/pgproject/pgsql.tpgbench/src/backend/bootstrap'
***
ERROR: `flex' is missing on your system. It is needed to create the
file `bootscanner.c'. You can either get flex from a GNU mirror site
or download an official distribution of PostgreSQL, which contains
pre-packaged flex output.
***


I knew I had flex on my system so this was kind of confusing. Usually there's more detail about dependency failures in the config.log file, and sure enough it had the details:


configure:6793: WARNING:
*** The installed version of Flex, /usr/bin/lex, is too old to use with PostgreSQL.
*** Flex version 2.5.31 or later is required, but this is /usr/bin/lex version 2.5.4.
configure:6807: result: no
configure:6815: WARNING:
*** Without Flex you will not be able to build PostgreSQL from CVS nor
*** change any of the scanner definition files. You can obtain Flex from
*** a GNU mirror site. (If you are using the official distribution of
*** PostgreSQL then you do not need to worry about this because the Flex
*** output is pre-generated.)


A quick check shows this change was made to allow some more powerful scanning capabilities in the PostgreSQL language parser, followed by inserting that warning.

Now, while it's possible to just build quickly from source and install directly over top of the existing flex in this case, I don't like to do that on production systems (or even development ones). It's not a good idea to mix the RPM packages on the system with stuff installed that way. It's better if you can create a new flex RPM package based on newer source code and upgrade to that one, then everything stays managed consistently with RPM.

You can locate SRPMs with a newer version of flex, the one released with Fedora 9, from the following locations:


Those were the first I found with a new enough version number that they should work. You could easily try to substitute the flex SRPM that comes with Fedora 10 or Fedora 11 instead, haven't tested that here myself yet to say how that goes; probably fine.

I used the kernel.org link, downloaded and installed like this:

$ wget ftp://mirrors.kernel.org/fedora/releases/9/Fedora/source/SRPMS/flex-2.5.35-1.fc9.src.rpm
$ sudo rpm -i flex-2.5.35-1.fc9.src.rpm

This dumps the source into /usr/src/redhat:

$ cd /usr/src/redhat/
$ ls SOURCES SPECS
SOURCES:
flex-2.5.35.tar.bz2

SPECS:
flex.spec

You can then build like this:

$ cd /usr/src/redhat/SPECS/
$ sudo rpmbuild -bb flex.spec

That produces the new RPM we want:

$ ls -l /usr/src/redhat/RPMS/x86_64/
-rw-r--r-- 1 root root 322619 Jul 22 17:14 flex-2.5.35-1.x86_64.rpm
-rw-r--r-- 1 root root 283675 Jul 22 17:14 flex-debuginfo-2.5.35-1.x86_64.rpm

Now install it:

$ cd /usr/src/redhat/RPMS/x86_64/
$ ls
flex-2.5.35-1.x86_64.rpm flex-debuginfo-2.5.35-1.x86_64.rpm
$ rpm -qa flex
flex-2.5.4a-41.fc6
$ sudo rpm -Uvh flex-2.5.35-1.x86_64.rpm
Preparing... ########################################### [100%]
1:flex ########################################### [100%]
$ rpm -qa flex
flex-2.5.35-1

And then cleanup:

$ cd /usr/src/redhat/SPECS/
$ sudo rpmbuild --clean --rmsource flex.spec
Executing(--clean): /bin/sh -e /var/tmp/rpm-tmp.99970
+ umask 022
+ cd /usr/src/redhat/BUILD
+ rm -rf flex-2.5.35
+ exit 0

After that, I had to run "configure" again to pick up the change, ran "make again", and now my build against CVS checkout of the future PostgreSQL 8.5 in progress compiles without any problems.

Note that normally, when you substitute a package like this you need to be careful you keep up with security patches to it because you're not going to get them automatically anymore. Since flex is a pretty low-level tool used only for developing software, I'm not too concerned about the security implication of my running a custom version here.

Monday, July 6, 2009

Python logging TypeError messages

I've been writing small Python programs for about two years now. There are a few things that slipped my notice until really recently though, and having good application logging instead of using "print" is on that list. Lost some time today figuring out that "global" goes inside functions where that global is used, rather than as a modifier when creating the variable; Understanding 'global' in Python cleared that up for me. While trying to sort that out, I started converting my program's mess of print statements into something that used the logging facility instead. There's an unexpected and quite messy surprise waiting there, a simple to make mistake that the logging facility completely botches handling in a friendly way.

Let's say we start with a simple program that logs the value of a variable at a point:


#!/usr/bin/env python
x=1
print "x =",x


We decide to switch that over to use logging instead, so that we can adjust whether that bit of detail is logged or not better. You might do that (wrongly!) like this:


#!/usr/bin/env python
import logging
x=1
logging.warning("x =",x)


I'd guess this is a pretty common amateur mistake. If you pass more than one argument into a logging statement, the module presumes the first one is a format string and the rest fill in parameters in that string. This gives you the following cryptic error when it uses the "%" facility to format that string with the variable given:


Traceback (most recent call last):
File "/usr/lib/python2.6/logging/__init__.py", line 760, in emit
msg = self.format(record)
File "/usr/lib/python2.6/logging/__init__.py", line 644, in format
return fmt.format(record)
File "/usr/lib/python2.6/logging/__init__.py", line 432, in format
record.message = record.getMessage()
File "/usr/lib/python2.6/logging/__init__.py", line 302, in getMessage
msg = msg % self.args
TypeError: not all arguments converted during string formatting


OK, so that's my bad. But notice what's missing from there? There's not a hint of an idea where in my code the error is at! All of the error messages refer to things within the logging __init__.py module. What to do?

This topic came up recently on the Python list, with a couple of messages about how to hack up the logging module to present more information. That works, but isn't particularly elegant. Buried in there was a clever way around this problem though: override the error handler. Here's what that looks like patched into our bad code example:


#!/usr/bin/env python
import logging

def handleError(self, record):
  raise
logging.Handler.handleError = handleError

x=1
logging.warning("x =",x)


Now when this is run, the error traceback you get starts by showing you the bad line in your code:


Traceback (most recent call last):
File "./logtest.py", line 9, in
logging.warning("x =",x)
File "/usr/lib/python2.6/logging/__init__.py", line 1441, in warning
root.warning(*((msg,)+args), **kwargs)
...


Note that just inserting an error handler like this isn't what you want to deploy, because an error in logging should introduce an application error. It's really more for making sure you get all your log messages correct in the first place. Make sure you read and following through understanding the comments about how handleError works later in the message thread before leaving this error dumping code in your app permanently.

I broke this down into a trivial case for demonstration, in my actual problem I had 600 lines of code filled with logging calls and only one of them was bad. Patching the error handler took me right to the bad one, and it was obvious how to fix it once I was staring at it.

Here's a correct snippet of code below, including some initialization bits I borrowed from the excellent Manage concurrent threads tutorial that makes the log messages more appropriate for threads. The reason I needed logging and global variables here were to cope with some program-wide thread locking issues with a single shared resource, and something like this is what's actually going into my app now that the log messages are all formatted properly and I could punt the error handler hack out of the program. But you can bet I'll remember that for the next time I run into the cryptic logging TypeError.

#!/usr/bin/env python
import logging
logging.basicConfig(
  level=logging.DEBUG,
  format='(%(threadName)-10s) %(message)s')
x=1
logging.warning("x = %s",x)


Note that there are all sorts of additional format strings like "%(threadname)" available, including timestamps and line numbers, and making changes to the base format lets you adjust that for the whole program. If you find yourself starting to put function names into your log messages or similarly repetitive work, you should consider using those substitution formatters instead.