The MU forums have moved to WordPress.org

WPMU Duplicate Content Bug and Workaround (7 posts)

  1. GregM
    Member
    Posted 16 years ago #

    Hi folks,

    I've posted over at the main WP forum regarding a duplicate content bug which has afflicted WP (and WPMU) since August 2006:

    http://wordpress.org/support/topic/129089

    I've offered a workaround for WP as well as WPMU which issues a 301 redirect when a URL requests a page that exceeds the end of a post (yes, a page which goes beyond the end of a post). Without the workaround, WP and WPMU return potentially infinitely many pages of duplicate content.

    Unfortunately, the workaround does not work if you use WP's built-in pagination capability (via 'nextpage'). Or, rather, it does work, but it makes it impossible to reach anything but the first page of a paginated post.

    Anyway, if anyone has any suggestions on improving the workaround -- or even for fixing the WP or WPMU core, that would be excellent!

    Thanks in advance,
    Greg

    p.s. Sorry to anyone who followed the original link -- the post initially appeared, and then disappeared. Maybe some oddity from trying to post at both forums bakc-to-back? Who knows...

  2. GregM
    Member
    Posted 16 years ago #

    Hey folks,

    Sorry, the link to my original post over in the WP forums is dead: I've tried to post 3 times, and 3 times the post has disappeared shortly after appearing. I can only conclude that one of the following two possibilities is the case:

    1. The forum is having more technical problems, related to the extensive 408 errors and "Couldn't Connect to DB" messages last week, or
    2. Some trigger-happy moderator thinks this issue shouldn't be discussed (even though it's been sitting in the public WP bug tracker for the last year).

    Who knows?

    Anyway, here's the link to my explanation of the problem and the partial workaround:

    http://whereelsetoputit.com/blog/wordpress-movable-type-duplicate-content/

    All the best,
    Greg

  3. drmiketemp
    Member
    Posted 16 years ago #

    Actually you're probably being caught by the akismet filter. It's a common issue over on the wp.org forums if you include any links within your post. It's one of the reasons why we've just spent the last month banning and removing Akismet from all of our client's sites and going with other solutions.

    I note from your blog post that there was a fix that actually caused this issue. Have you opened up a trac ticket and pointed that out to staff? You don't mention that you have or not. the only trac links I see are to the original issue and the fix that causes this issue.

  4. SteveAtty
    Member
    Posted 16 years ago #

    So what you are saying is that if I take a valid permalink and shove an extra trailing slash and some numbers on it then WordPress ignores it and gives me the content of the permalink?

    I assume you think it should return a 404?

  5. lunabyte
    Member
    Posted 16 years ago #

    Or redirect to the true permalink, sending back an error through the header.

  6. GregM
    Member
    Posted 16 years ago #

    Hi folks,

    drmike --> Yes, it might well be the filter. It's only supposed to trigger when there are 2 URLs, I think, but I'm guessing that because I formatted the URL as a link, plus the link again as the anchor text, that got it caught. With regard to the trac ticket, the problem with the fix was pointed out immediately by the person who submitted the original problem, but it was decided at that time not to fix it. (Have a look: you can see it in the discussion on the original bug.)

    SteveAtty --> Yes, a 404 seems like a good response to a request for something that shouldn't exist. Or...

    lunabyte --> Yes, something like a 301 redirect back to a valid permalink seems like a good alternative.

    Unfortunately (as pointed out in the original discussion of the bug, when it was decided not to fix it), by the time WP has discovered that the page doesn't exist, it is too late to return an error or a redirect, because headers have already been sent. That's why in the workaround I've suggested, the redirect is happening via mod_rewrite rules, before WP even sees what is coming.

    But unfortunately again, that also means you can't use paginated posts at all if you're going to use this fix -- because the mod_rewrite rules don't know what would be a valid page request either.

    That's the conundrum. I'd like to be able to use paginated posts in a new project I'm setting up. (Fortunately, none of my existing blogs going back to WP 1.2 have ever used them.) Unless the devs take a second look at fixing a bug they originally decided not to fix, or unless some other clever person can come up with a way to fix it, we're stuck with choosing between: 1) built-in page support and a significant duplicate content vulnerability, or 2) a fixed duplicate content problem but no paging support.

    All the best,
    Greg

  7. drmiketemp
    Member
    Posted 16 years ago #

    It's only supposed to trigger when there are 2 URLs, I think

    That's the wordpress comment moderation filter. Akismet will catch on all sorts of things.

About this Topic

  • Started 16 years ago by GregM
  • Latest reply from drmiketemp