The following is cross posted from my personal blog, in case you’re subscribed to both and think you’re seeing duplicates, you’re not.
So a while ago I set up buildbot for Adium. Briefly, buildbot provides continuous integration (i.e., building the source tree after every checkin) and runs our unit tests automatically. Cool stuff, and hats off to the buildbot team. Things seemed to be running fine for a while, no problems. However, recently we’ve started to get some odd errors on the machine we use for running builds, a Mac Mini named smew
1.
Subversion began to fail looking up DNS requests. I could only reproduce the problem when buildbot was running svn. If I logged in, I could run the exact same commands myself. And even more curiously, telling buildbot to run nslookup svn.adiumx.com
worked completely fine.
I “solved” this by having the buildbot master (on a Linux machine) doing the lookup and then telling the client to checkout svn://<ip here>
. If the IP of the subversion server changes, we just need to do a clean build and it’ll pick up the change. It’s not a great solution, but definitely workable.
This worked either briefly or perhaps not at all, I don’t recall, because our automated tests began failing like so:
/Developer/Tools/RunUnitTests:298: note: Started tests for
architectures 'ppc i386'
/Developer/Tools/RunUnitTests:301: note: Running tests for
architecture 'ppc'
Wed Feb 13 02:07:40 smew.adiumx.com otest[41048] <Error>:
kCGErrorRangeCheck :
On-demand launch of the Window Server is allowed for root user only.
Wed Feb 13 02:07:40 smew.adiumx.com otest[41048]
<Error>: kCGErrorRangeCheck :
Set a breakpoint at CGErrorBreakpoint() to catch errors
as they are returned
2008-02-13 02:07 otest[41048] (CarbonCore.framework)
FSEventStreamStart: ERROR:
FSEvents_connect() => (ipc/send) invalid destination port (268435459)
FAILED TO GET ASN FROM CORESERVICES so aborting.
/Developer/Tools/RunUnitTests: line 301: 41048 Abort trap
arch -
arch "${TEST_ARCH}" "${TEST_RIG}" "${TEST_BUNDLE_PATH}"
/Developer/Tools/RunUnitTests:314: error:
Test rig '/Developer/Tools/otest' exited abnormally
with code 134 (it may have crashed).
** BUILD FAILED **
This is particularly strange because again, I can run these tests manually and get proper results. Same user buildbot is running in (and that user is logged in to the machine and has a window server connection), same checkout, same everything, near as I can tell.
This may or may not be superstition and it is probably just a coincidence, but sometimes these tests do run, and it seems that when I am logged in to the machine via ssh things work OK. But after I log out, things go screwy again. It’s something screwy with that particular machine — I had the buildbot slave running on a Mac mini while I was at Mozilla and it worked just fine.
I’ve run a permissions repair. It fixed some things. Still no dice. Buildbot is using the python installed by Leopard. The machine is fully updated, none of that fixed the problems (not even the Leopard graphics update). This machine is located at a colo somewhere inaccessible (Mars?), so while doing an archive and install would normally be my next step, I don’t have easy access to the machine.
I’ve done everything I can think of that I can do easily. Help me blogosphere, you’re my only hope.