The MU forums have moved to WordPress.org

libcurl requests (15 posts)

  1. SteveAtty
    Member
    Posted 14 years ago #

    I'm getting repeated libcurl requests for a couple of pages from the same IP addresses, over and over again and of course each time they fetch the core of the page (without taking anything else). For example 174.57.205.127 made 19 requests yesterday and 32 on Thursday

    So I'm puzzled as to what is going on. I know some bots use libcurl to scrape but this does seem a bit silly,

  2. andrea_r
    Moderator
    Posted 14 years ago #

    I'd whois the IP then bring out the banhammer.

  3. SteveAtty
    Member
    Posted 14 years ago #

    Its someone on Comcast,

    Actually I'm looking at my logs and libcurl seems to be up to some very odd things including :

    99.20.134.68 - - [06/Oct/2009:11:15:34 +0100] "POST /wp-signup.php HTTP/1.1" 200 5527 "-" "curl/7.18.2 (i386-pc-win32) libcurl/7.18.2 zlib/1.2.3"
    99.20.134.68 - - [06/Oct/2009:11:15:37 +0100] "POST /wp-signup.php HTTP/1.1" 200 7441 "-" "curl/7.18.2 (i386-pc-win32) libcurl/7.18.2 zlib/1.2.3"

    Actually I can't see anything really "normal" being done by libcurl, all it seems to do is hit some pages over and over again.

    I've got 47 distinct IPs doing it.

  4. SteveAtty
    Member
    Posted 14 years ago #

    Actually I dug round in my logs going back a few weeks and I've not had one single valid request from curl, its associated with signup bots and other dodgy processes including a company whose internet business is "searching the internet for brand abuse".

    So is there anything "wrong" with adding the following to .htaccess:

    RewriteCond %{HTTP_USER_AGENT}  ^curl/7.*
    RewriteRule $         curl.html        [L]

    and curl.html explaining in a few bytes that curl is blocked?

    Further analysis of my logs (can't you tell I'm bored) shows that each of these IPs polls a post (usually the same one) approximately every 30 minutes so this isn't something like someone wrapping the content of my posts in another site its obviously some sort of bot - especially as the user-agent string is identical for all requests from all IPs

    It might be worth others checking to see if they're seeing the same sort of thing.

  5. tdjcbe
    Member
    Posted 14 years ago #

    99.20.134.68 is a known bot for us and has gotten a four hour block on a number of occasions. (We block for 4 hours, allow it again and see what happens.) Google'ing for it pulls up a couple of reports on it. (One being a wpmu list of new accounts on a blogspot blog which is really weird.)

    174.57.205.127 is a new one for us although I see some traffic on it. No reports though.

    Comcast is a known bit bucketer for abuse complaints. Not worth the trouble following up on.

    edit: I should have added this in. libcurl is used by a number of linux based content scrapers looking for either new blogs to send comment spam to or email addresses. More likely the first one. The url in question probably got dropped in there as a starting point and it;s looking for urls that you linked to or urls that got included in comments. Although your 99.* address looks like a splog creator.

  6. SteveAtty
    Member
    Posted 14 years ago #

    All of them are simply pulling down my posting to wordpress by email post, which does sort of suggest something doesn't it.

    I dropped my libcurl agent check in and its now delivering 132 bytes instead of about 76000 and since I put it in I've had about 60 curl requests.

    Right now I'm trying to work out if there is a valid reason for allowing curl to access my site. I get no curl hits on any other site I'm running, just the WPMU one.

  7. tdjcbe
    Member
    Posted 14 years ago #

    I'd have to go digging but I believe some platforms use curl to send trackback notifications. Granted though both of these IP addresses are endusers and not servers I believe. (Don't remember exactly. Do too many lookups during the day.)

  8. SteveAtty
    Member
    Posted 14 years ago #

    Well all of the curls I've seen have been GETS pulling the core of the page back, which I don't think is normal trackback behaviour, and as I said these are coming in so regularly that I can't believe that they are anything but bots of some sort

  9. tdjcbe
    Member
    Posted 14 years ago #

    It might be. I've seen software that uses get_file_contents to grab an entire page just for a simple string of data. Heck, I write such software. I'm lazy. :)

    Gotta admit though that I think it;s a scraper or a spammer as well in this case. Both IPs given are endusers IP addresses that normally wouldn't run servers at home.

  10. SteveAtty
    Member
    Posted 14 years ago #

    Well I did some more digging and from time to time all the IP addresses doing the GETs do a POST to the signup page, and every IP address I've got are enduser IPs and not a server.

  11. SteveAtty
    Member
    Posted 14 years ago #

    I've been watching this and all of the "rogues" eventually end up doing this:

    24.201.2.90 - - [26/Oct/2009:08:05:39 +0000] "POST /wp-signup.php HTTP/1.1" 200 5527 "-" "curl/7.19.6 (i386-pc-win32) libcurl/7.19.6 OpenSSL/0.9.8k zlib/1.2.3"
    24.201.2.90 - - [26/Oct/2009:08:05:40 +0000] "GET /wp-signup.php HTTP/1.1" 200 6571 "-" "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)"
    24.201.2.90 - - [26/Oct/2009:08:05:41 +0000] "POST /wp-signup.php HTTP/1.1" 200 7387 "-" "curl/7.19.6 (i386-pc-win32) libcurl/7.19.6 OpenSSL/0.9.8k zlib/1.2.3"

    Notice how the user agent string changes ;-)

    So its a bot, and I've been cross checking IPs against project honeypot and just about every single one of them has been in there.

  12. andrea_r
    Moderator
    Posted 14 years ago #

    Huh. that IS interesting.

    I wonder if the Honeypot guys would add the wp-signup stuff to their plugin. Or if it would work anyway.

  13. SteveAtty
    Member
    Posted 14 years ago #

    I've been adding them to my .htaccess file and checking the log file and the minute you add them with a deny from they still do a few gets which of course return 403s but they stop attempting to do POSTs and I have not see a POST with a 403 against it!

  14. error
    Member
    Posted 14 years ago #

    This is just the sort of thing that Bad Behavior looks for and blocks.

  15. SteveAtty
    Member
    Posted 14 years ago #

    Well. Time for an update.

    I've been keeping an eye on what has been going on and I've now got a list of IP addresses which have only ever visited using curl and have tried to post to wp-signup

    On the past 4 days I've blocked curls that would, given the pages they have hit, have caused 250MB of data. OK its not a lot but.....

    I have NOT see any valid visits with curl user agents - they've all been client PCs and over 90% of them have gone on and tried to do a post to wp-signup.

    Could Bad Behaviour trap it? It looks like to install for WPMU it has to be on a blog by blog basis. Also I know that some plugsins don't like running in subdirectory mode.

About this Topic

  • Started 14 years ago by SteveAtty
  • Latest reply from SteveAtty