Tuesday, September 29, 2009

Module API documentation in Python

Sometimes I fondly reminisce about the days when all of the code I worked on was in one programming language. Nowadays, it's a mix of C (mainly related to the PostgreSQL code base), Java (my employer's middleware and lot of my personal code), and Python (systems programming, general utilities, and QA test code). Python is the most recent of those to be added to the mix, and it's proven to have its own unique code documentation challenges, some of which have clarified how to deal with the other languages in the process.

First I should label my expectations here. I'm not a big fan of dynamic typing to begin with, and I'd at least like to document what type each parameter all of the code I intend to be reusable expects, even if those restrictions aren't enforced at compile time. Both C and Java require specifying types for every parameter, and Java includes its Javadoc mechanism for labeling the parameters with their intended purpose and function. That's all I really want: feed in a bit of source code that includes some markup for what all the parameters mean, along with general text commentary; get HTML/PDF output that documents the API presented by that code.

One thing I've found very disappointing about Python is that that its development community seems to actively reject the idea of good parameter documentation directly in the source code. The closest thing I've seen is the PEP for Function Annotations, which are so barebones I wouldn't consider them a help even if they were more mainstream (they're not yet). All we really get for in-code documentation are the Docstring Conventions and pydoc, which don't provide any standard way to label parameters in a way more complicated browsing or analysis tools can utilize.

The first tool I considered for this purpose is Epydoc. This understands Javadoc formatted docstring and ReST, which are two standards I already code documentation using. This includes its own somewhat odd variable docstring syntax, which I didn't find very useful. A similar tool that knows much more about subclassing is pydoctor, whose introduction mentions a bunch of other projects in this area neither I nor them were impressed by.

Another Python specific tool here is pythondoc. My first problem with that project are that it seems kind of dead. Ultimately, my bigger concern is that I'd like to use Python docstrings as much as possible, just with additional markup inside them. pythondoc seems to prefer # formatted comments which aren't really acceptable here.

I keep circling back to Javadoc markup as the only reasonable one here. Ultimately, if I'm using Javadoc format, with nothing Python specific, I have to ask myself why I should adopt a one-off tool such as Epydoc, if instead I can get one that supports the other languages I use and provides a wider feature set. To see the perils of that approach, check out the train wreck answer to the FAQ how to print Javadoc to PDF. What a disaster. To work around that Javadoc limitation, I'd already started moving toward using Doxygen, which I know works great on the C code I browse most via the PostgreSQL code base. (Arguing the merits of doxygen vs. javadoc just in a Java context is a popular topic; see Javadoc or Doxygen? and Doxygen Versus Javadoc for two examples)

A quick check of the full Comparison of documentation generators page didn't give other tools that looked like they would help here. At this point I started to settle on a tentative approach that would unify my work with one tool to use: doxygen + Javadoc formatted parameters in a docstring I could live with. One problem: if you use the Python standard docstring approach, doxygen's Python support won't allow any special commands in there. That's pretty much useless.

Luckily I'm not the first person to make that leap: doxypy is a filter that takes regular Python code with the usual docstring format in, producing an intermediate file in the format doxygen wants to work with. But where's the examples of how it works to get people started?

Luckily, like all good software the authors eat their own dogfood, and the filter itself is a Python program documented so that doxypy can process it. Here's a simple example of a method call from inside it:

def makeTransition(self, input):
""" Makes a transition based on the given input.

@param input input to parse by the FSM

In this case FSM means "finite-state machine" and not my deity of choice.

Something this simple was all I was looking for, and the only open point here is that Javadoc format presumes one can divine the type from the declaration; that's not so clear here.

Wednesday, September 16, 2009

Following symlinks in Python

Today's Python trivia question: you have the path of a symbolic link. How do you get the full destination that link points to? If your answer is "use os.readlink", well it's not quite that easy. I'm not alone in finding the docs here confusing when they say: "the result may be either an absolute or relative pathname" and then only tell you how to interpret the result if it's relative. This guy wonders the same thing I did, which is how to know whether the returned value is a relative or absolute path?

I found a clue as to the way to handle both cases in the PathModule code, which is that you use os.path.isabs on the result to figure out what you got back. That module is a lot of baggage to pull in if you just want to correct this one issue, though, so here's a simpler function that knows how to handle both cases:

def readlinkabs(l):
Return an absolute path for the destination
of a symlink
assert (os.path.islink(l))
p = os.readlink(l)
if os.path.isabs(p):
return p
return os.path.join(os.path.dirname(l), p)

I hope my search engine cred bubbles me up so someone else trying to look this up like I did doesn't have to bother reinventing this particular tiny wheel.