The MU forums have moved to WordPress.org

Ultimate htaccess blacklist (7 posts)

  1. Farms
    Member
    Posted 16 years ago #

    What do people think to this?

    http://perishablepress.com/press/2007/06/28/ultimate-htaccess-blacklist/

    Worth implementing?

  2. xiand0
    Blocked
    Posted 16 years ago #

    I use a .htaccess blacklist which is very similar to the "ultimate" blacklist.

    http://en.linuxreviews.org/HOWTO_stop_automated_spam-bots_using_.htaccess

    One thing I immediately noted regarding this "ultimate" list is that it includes robots like Archive.org (ia_archiver). This bot does have public benefit (and also respects robots.txt, so you don't really need to deny it by .htaccess).

    It also misses bots like libghttp (a gnome library used mainly by spamsoftware).

    A .htaccess blacklist IS a good idea, but cut'n'paste of lists like this "ultimate" blacklist - or the one I use for that matter - isn't a good idea. For example, if someone posts a .zip or .tar.bz2 or even a large .avi file on their blog then I'll much likely download it using Wget, which happens to be my favorite download manager(!) - but I can't if Wget is in a .htaccess blacklist .. or can I? Yes I can, because my wget is an alias for wget -U "Mozilla". And this is why such blacklist are worth very little alltogether, you can simply configure the software (including browsers..) to supply whatever commonly used User-Agent string you want.

  3. Farms
    Member
    Posted 16 years ago #

    Thanx, good answer!

    I liked this quote from the article:

    ".htaccess can effectively block any spam-bot which admits to being one." :)

  4. quenting
    Member
    Posted 16 years ago #

    this means 50 conditions evaluated for every HTTP request received by the server. I'd rather use a bot trap, which won't bother regular users, and will trap spambots which don't admit being one.

  5. pkiff
    Member
    Posted 16 years ago #

    One thing I immediately noted regarding this "ultimate" list is that it includes robots like Archive.org (ia_archiver).

    Just a small correction. The Archive.org bot is not ia_archiver. ia_archiver is the Alexa Web bot. I don't think the "Ultimate Blacklist" blocks the Archive.org bot.

  6. drmiketemp
    Member
    Posted 16 years ago #

    ".htaccess can effectively block any spam-bot which admits to being one."

    That's the issue. First thing I would do is I was writing a scraper would be to ignore htaccess.

  7. lunabyte
    Member
    Posted 16 years ago #

    Scrapers can't ignore htaccess, that on the server side. Were you meaning robots.txt, perhaps?

About this Topic