[Babase] Lots of darcs processes on papio?
Karl O. Pinc
kop at meme.com
Tue Apr 27 17:41:31 EDT 2010
On 04/27/2010 03:14:16 PM, Ryan Hardy wrote:
> Hi all,
>
> I just got an alert from our monitoring system that SSH was
> unavailable on papio. Upon checking it out, I found that it was not
> down but that the system load was at 69 or so. There are a ton of
> darcs processes being run by the apache server that seem to be the
> culprits.
Yah. I discovered this months and months ago.
The problem is that the software is ancient. Hence the darcs web
interface does not do the http-foo necessary to indicate that
the archive pages have not changed. Hence the web spiders
hit the box hard.
I tried a robots.txt that denies spider access to the darcs
archive but it seems that Microsoft's Bing, at least,
also detects old robots.txt copies in the darcs archive
that allow spidering. Bing _should_ ignore all robots.txt
that are are elsewhere than at document root but it does
not and appears to choose the most liberal policy it can
find. I don't _think_ google's got this problem, but I'm
not sure. Google seems to be polite about it's spidering
but Bing hits the box hard and so that's where I focused
my investigations.
The easy answer is to turn off the darcs web interface.
My plan was to disable the darcs web interface if it became
a problem until we upgrade the OS and get something newer.
Suggestions?
Karl <kop at meme.com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein
More information about the Babase
mailing list