[Babase] Lots of darcs processes on papio?

Karl O. Pinc kop at meme.com
Tue Apr 27 17:41:31 EDT 2010


On 04/27/2010 03:14:16 PM, Ryan Hardy wrote:
> Hi all,
> 
> I just got an alert from our monitoring system that SSH was
> unavailable on papio.  Upon checking it out, I found that it was not
> down but that the system load was at 69 or so.  There are a ton of
> darcs processes being run by the apache server that seem to be the
> culprits.

Yah.  I discovered this months and months ago.

The problem is that the software is ancient.  Hence the darcs web
interface does not do the http-foo necessary to indicate that
the archive pages have not changed.  Hence the web spiders
hit the box hard.

I tried a robots.txt that denies spider access to the darcs
archive but it seems that Microsoft's Bing, at least,
also detects old robots.txt copies in the darcs archive
that allow spidering.  Bing _should_ ignore all robots.txt
that are are elsewhere than at document root but it does
not and appears to choose the most liberal policy it can
find.  I don't _think_ google's got this problem, but I'm
not sure.  Google seems to be polite about it's spidering
but Bing hits the box hard and so that's where I focused
my investigations.

The easy answer is to turn off the darcs web interface.
My plan was to disable the darcs web interface if it became
a problem until we upgrade the OS and get something newer.

Suggestions?


Karl <kop at meme.com>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein




More information about the Babase mailing list