Sunday, June 10, 2012

Killing Deadlocked Starman Processes

Every once in a while I caught a few starman workers on our live server stuck in a deadlock. Googling around it seems that this might be related to the following issues (I couldn't reproduce it in chrome incognito mode, and I don't have IE9 near my hands to test it with IE9):

https://github.com/miyagawa/Plack/issues/208
https://github.com/miyagawa/Plack/issues/191

I've resorted to killing the deadlocked processes as the starman master will happily spawn new workers afterwards. To automate this I wrote a perl script and schedule it to run via cron. The script is run at 15-minutes-interval. It checks for deadlocked starman worker processes by inspecting their cpu runtime and workload. It will kill any of the starman worker processes that has been running for quite a while with a high-ish cpu workload.

Here's the script: https://gist.github.com/2908369
Update: The issue dealt in this commit is the more likely cause for the blocking of starman processes.