The MU forums have moved to WordPress.org

Problem with robots.txt (15 posts)

  1. Mat_
    Member
    Posted 16 years ago #

    Hi all!

    It's a simple question, how to modify the robots.txt ?
    I try the plugin KBrobotsTXT and after some modification it's giving me this: http://musique.kune.fr/robots.txt

    User-agent: GoogleBot
    Disallow:

    GoogleBot is supposed to be allowed but when i'm going to the google tools and check my robots.txt, it's saying me that my robots.txt content is
    User-agent: *
    Disallow: /

    I don't understand.

    I change the Option>Privacy> And set it to allow all search engine.

    Last information, on the top of the google tools page, there is writen that the last download of my robots.txt is 2 hours ago, before i add the plugin

    Where is the robots.txt created when we create a blog please ?

  2. Mat_
    Member
    Posted 16 years ago #

    Another info, it seems like the robots.txt we are seeing and the robots.txt that google download aren't the same.
    Because when i'm trying to test manually the adress musique.kune.fr/robots.txt there is no problem
    i really don't understand ...

  3. jackiedobson
    Member
    Posted 16 years ago #

    Why not just create a file, name it robots.txt, and upload it to the root of the install? One less plugin to worry about along with the included processing.

  4. andrea_r
    Moderator
    Posted 16 years ago #

    you only need one robots'txt file in the root of your install, and no plugins. Google will catch up to it eventually.

  5. Mat_
    Member
    Posted 16 years ago #

    Thanks for your replys.
    I have added a robots.txt empty at the root of my install.

    For now on, i still have pages who are not viewed by the googlebot:

    http://musique.kune.fr/ URL restricted by robots.txt
    http://musique.kune.fr/category/coups-de-coeur/ URL restricted by robots.txt
    http://musique.kune.fr/category/decouvertes/ URL restricted by robots.txt

    I maybe have to wait longer and let the crawler see all my pages.

    Any idea ? Thx !

  6. jackiedobson
    Member
    Posted 16 years ago #

    I have added a robots.txt empty at the root of my install.

    You may be having a conflict then with the empty file as well as the plugin trying to do another. (ie Two legs down a single pants leg.) If Google has any problems what so ever, it won't index and throw up errors. We've noted that MSN and other search engines are a lot more forgiving. As noted above, why not just edit the empty robots.txt file that's already there?

  7. Mat_
    Member
    Posted 16 years ago #

    Sorry, i disabled the plugin before creating the empty robots.txt

  8. Mat_
    Member
    Posted 16 years ago #

    Hum, robots.txt seems to be ok for google but when i try to display it i got different thing:
    http://musique.kune.fr/robots.txt
    give me:

    (empty)

    http://musique.kune.fr/robots.txt/
    give me:

    User-agent: *
    Disallow:

    It seems that this is the second who is used by google. It could not be a problem if, in the admin of google web tools, i didn't have this:

    http://musique.kune.fr/ URL restricted by robots.txt
    http://musique.kune.fr/category/coups-de-coeur/ URL restricted by robots.txt
    http://musique.kune.fr/category/decouvertes/ URL restricted by robots.txt
    http://musique.kune.fr/category/la-video-de-la-semaine/ URL restricted by robots.txt
    http://musique.kune.fr/category/selection-jamendo/ URL restricted by robots.txt
    http://musique.kune.fr/category/sur-le-net/ URL restricted by robots.txt
    http://musique.kune.fr/decouvertes/yael-naim/ URL restricted by robots.txt
    http://musique.kune.fr/les-artistes-se-presentent/vivement-hier-agaetisly/ URL restricted by robots.txt
    http://musique.kune.fr/podcast/ URL restricted by robots.txt
    http://musique.kune.fr/retro/joshua-redman/

    Why do i have different robots.txt ? And why does it saying to me that some pages are not accessible for googlebot ?

    Thx !

  9. andrea_r
    Moderator
    Posted 16 years ago #

    You'll have to use your server's file manager and go through all the forlders and subfolders looking for robots.txt files in case there are some stray ones. Remove 'em ALL.

    If you want Googel to find everything and not be restricted, you don't actually need a robots.txt file.

    And you will have to wait at least 24 hours for them to see your changes.

  10. Mat_
    Member
    Posted 16 years ago #

    Hi !

    andrea_r, i have made a search in all folders and subfolders ( with a grep command ) and it didn't find anything.

    I see that in the wp-includes/rewrite.php, line 739 and 740, there is:

    // robots.txt
    $robots_rewrite = array('robots.txt$' => $this->index . '?robots=1');

    I'm going to wait ..

  11. Mat_
    Member
    Posted 16 years ago #

    I have modify my wp-includes/functions.php .
    Now even if i tae a look on robots.txt or robots.txt/ i have the same thing.
    I hope that google will now see my pages !

  12. donncha
    Key Master
    Posted 16 years ago #

    Google won't look for robots.txt/ so don't worry about it. That url was handled by WordPress. Look in wp-includes/rewrite.php I think for that bit of code. There are hooks in there for adding to the robotos.txt

  13. Mat_
    Member
    Posted 16 years ago #

    Yep. So now i don't have any robots.txt or .txt/ , and i'm going to wait for googlebot to visit my website.
    it seems like google now can read Allow: option, so if it's still not work tomorow, i will try to add a robots.txt with

    User-agent: GoogleBot
    Allow: /folder-unreadable-by-googlebot

    I don't have anything to lose.

    ....

    After some research, i finally see that the last visit by googlebot on my pages was between the 13/02/2008 and 20/02/2008

    So i think this is normal ...
    Just have to wait his next visit!

    If someone want to see the results:

    http://spreadsheets.google.com/pub?key=p4eEhOj2fRH2jHgKRsS2qWw

  14. andrea_r
    Moderator
    Posted 16 years ago #

    You may have to out in a sitemap plugin to make sure the Google God sees all your pages. (search the forums, it's talked about a lot)

    Alos, in my experience, it takes them sometimes as much as TWO WEEKS to get their act together and catch everything.

  15. Mat_
    Member
    Posted 16 years ago #

    Thx andrea_r, it really seems that ggbot will take more than i expected to come back...
    I will wait !

About this Topic