I'd like to block robots/spiders from indexing a subset of the existing blogs on my WPMU install. For example, I want to use some blogs for internal development/testing purposes and I don't really want their content to come up in a search result on Google. At the same time, I don't want to password protect them; that would be overkill considering that I still want certain external services (like FeedBurner) to be able to connect to those internal blogs.
What's the best way to do this if I want to block all robots/spiders? How about only specific ones?
Thanks,
Jacob
No, I haven't launched yet, everything's still password-protected for now, and I haven't set up anything for robots at all, like a Google sitemap.
But can't you just simply use the privacy option for blocking search engines?
I could, but that means I need to set the option for each blog, maybe even every blog, one.. by... one...
I'm thinking there must be a better way, perhaps via regular expression-based rules in .htaccess files or robots.txt, but I haven't found it yet. That's why I came knocking :)
I meant, did you search Google for your answer. Setting up a robots.txt file is easy if you want to block them all out and you plunk it in your root folder.
Setting it up to not crawl virtual subdomain is a lot harder and from a quick search I guess it's done in htaccess. In general, that's not an MU-specific question though.
And the included privacy option is a box the user ticks on setup. Unless you're manually setting up a bunch. Do you really have so many test blogs it would be a pain to change that setting? I find my test blog doesn't come up in google at all. (then again I haven't tried to search really specific for it...)
Actually, I Google before I do anything these days :)
Initially, I guess that I will just use the privacy option. However, I always try to take a longterm view, so I'll keep looking for some useful htaccess rules.
Thanks, Andrea and KK