Wednesday, October 7, 2009

Writing monitoring threads in Python

A common idiom in programs I write is the monitoring thread. If you have a program doing something interesting, I often want to watch consumption of some resource in the background (memory, CPU, or app internals) while it runs. Rather than worrying the main event loop with those details, instead I like to fire off a process/thread to handle that job. When the main program is done with its main execution, it asks the thread to end, then grabs a report. If you write a reusable monitoring library like this, you can then just add monitoring thread for whatever you want to watch within a program with a couple of lines of code.

Threading is pretty easy in Python, and the Event class is an easy way to handle sending the "main program is exiting, give me a report" message to the monitoring thread. When I sat down to code such a thing, I found myself with a couple of questions about exactly how Python threads die. Some samples:
  • Once a thread's run loop has exited, can you still execute reporting methods against it?
  • If you are sending the exit message to the thread via a regular class method, can that method safely call the inherited thread.join and then report the results itself only after the run() loop has processed everything?
Here's a program that shows the basic outline of a Python monitoring thread implementation, with the only thing it monitors right now being how many times it ran:
#!/usr/bin/env python

from threading import Thread
from threading import Event
from time import sleep

class thread_test(Thread):

def __init__ (self,nap_time):
Thread.__init__(self)
self.exit_event=Event()
self.nap_time=nap_time
self.times_ran=0
self.start()

def exit(self,wait_for_exit=False):
print "Thread asked to exit, messaging run"
self.exit_event.set()
if wait_for_exit:
print "Thread exit about to wait for run to finish"
self.join()
return self.report()

def run(self):
while not self.exit_event.isSet():
self.times_ran+=1
print "Thread running iteration",self.times_ran
sleep(self.nap_time)
print "Thread run received exit event"

def report(self):
if self.is_alive():
return "Status: I'm still alive"
else:
return "Status: I'm dead after running %d times" % self.times_ran

def tester(wait=False):
print "Starting test; wait for exit:",wait
t=thread_test(1)
sleep(3)
print t.report() # Still alive here
sleep(2)
print "Main about to ask thread to exit"
e=t.exit(wait)
print "Exit call report:",e
sleep(2)
print t.report() # Thread is certainly done by now

if __name__ == '__main__':
tester(False)
print
tester(True)
Whether or not to call the thread's "join" method from the method that requests it to end is optional, so we can see both behaviors. Here's what the output looks like:
Starting test; wait for exit: False
Thread running iteration 1
Thread running iteration 2
Thread running iteration 3
Status: I'm still alive
Thread running iteration 4
Thread running iteration 5
Main about to ask thread to exit
Thread asked to exit, messaging run
Exit call report: Status: I'm still alive
Thread run received exit event
Status: I'm dead after running 5 times

Starting test; wait for exit: True
Thread running iteration 1
Thread running iteration 2
Thread running iteration 3
Status: I'm still alive
Thread running iteration 4
Thread running iteration 5
Main about to ask thread to exit
Thread asked to exit, messaging run
Thread exit about to wait for run to finish
Thread run received exit event
Exit call report: Status: I'm dead after running 5 times
Status: I'm dead after running 5 times
That confirms things work as I'd hoped. That is usually the case in Python (and why I prefer it to Perl, which I can't seem to get good at predicting). I wanted to see it operate to make sure my mental model matches what actually happens though.

Conclusions:
  1. If you've stashed some state information into a thread, you can still grab it and run other thread methods after the thread's run() loop has exited.
  2. You can call a thread's join method from a method that messages the run() loop and have it block until the run() loop has exited, that works. This means the method that stops things can be setup to return only complete output directly to the caller requesting the exit.
With that established, I'll leave you with the shell of a monitoring class that includes a small unit test showing how to use it. Same basic program, but without all the speculative coding and print logging in the way, so it's easy for you to copy and run with to build your own monitoring routines. The idea is that you create one of these, it immediately starts, and it keeps going until you ask it to stop doing whatever you want in the background--at which point it returns its results (and you can always grab them later too).
#!/usr/bin/env python

from threading import Thread
from threading import Event
from time import sleep

class monitor(Thread):

def __init__ (self,interval):
Thread.__init__(self)
self.exit_event=Event()
self.interval=interval
self.times_ran=0
self.start()

def exit(self):
self.exit_event.set()
self.join()
return self.report()

def run(self):
while not self.exit_event.isSet():
self.times_ran+=1
sleep(self.interval)

def report(self):
if self.is_alive():
return "Still running, report not ready yet"
else:
return "Dead after running %d times" % self.times_ran

def self_test():
print "Starting monitor thread"
t=monitor(1)
print "Sleeping..."
sleep(3)
e=t.exit()
print "Exit call report:",e
self_test=staticmethod(self_test)

if __name__ == '__main__':
monitor.self_test()
The main thing you might want to improve on here for non-trivial monitoring implementations is that the interval here will vary based on how long the monitoring task takes. If you're doing some intensive processing that takes a variable amount of time to happen at each interval, you might want to modify this so that the sleep time is adjusted so to aim for a regular target time, rather than to just sleep the same amount every time.

(All the above code is made available under the CC0 "No Rights Reserved" license and can be incorporated in your own work without attribution)

1 comment:

Sachin said...

Really helpful with the examples! Thanks!