Make WordPress Core

Opened 11 years ago

Last modified 5 years ago

#22530 reopened defect (bug)

garbage query strings on URLs are not sanitized or removed

Reported by: rawalex's profile rawalex Owned by:
Milestone: Priority: normal
Severity: normal Version: 3.4.2
Component: General Keywords: reporter-feedback
Focuses: Cc:

Description

Here is an interesting problem I ran into, a bug / feature that appears to be used by malicious people to cause Google to see your site as full of duplicate content.

If you visit a wordpress site, and add a garbage query string to the end of the URL, that garbage gets carried forward. Example:

yourblog.here/page/2?ssdlfkjsdlkfjsdfs

When you scroll down, the "previous" and "next" links will automatically carry that query string forward.

Normally, this would not be a big issue. However, some people appear intent on specifically creating these sorts of links to wordpress sites, and Googlebot is finding those links on remote sites. Those links are followed, and then the "previous - next" situation perpetuates the problem through every page on the site. If you have 1000 posts, at 10 per page, Google just indexed 100 duplicate content pages.

So the bug is the following:

Passed query strings need to be sanitized, and junk removed - there is no reason to pass it on. In the case of a junk passed string, there should be an http 301 or 302 reply and the user / bot redirected to the proper page without the query string.

Further, query strings should not be perpetuated forward through the "previous - next" links on the pages unless they are relevant to that page change. As an example, a valid search string might be worth moving forward with. Other passed items may not be worth carrying forward.

Potentially, any unsanitized input accepted in a query is a vector for other attacks. Having that query carry forward is a real issue. As an example, full select * from queries are not accepted and not dealt with, and perpetuated forward. No, they are not currently actually causing anything to happen, but a failure to sanitize these inputs suggests a vector for a future attack, such as an input overflow or similar.

Change History (10)

#1 @SergeyBiryukov
11 years ago

  • Keywords needs-patch removed
  • Milestone Awaiting Review deleted
  • Resolution set to duplicate
  • Severity changed from critical to normal
  • Status changed from new to closed

Duplicate of #21113.

#2 @jrivett
9 years ago

  • Resolution duplicate deleted
  • Status changed from closed to reopened

I would like to see this issue addressed, as it's causing problems for one of my sites. This ticket was closed as a duplicate of #21113, and that ticket was closed on the assumption that the problem was due to a misconfigured cache.

I am definitely seeing this problem, and I'm not using caching.

To demonstrate the problem, I found another Wordpress site (not one of mine) that shows it. I then posted a link with an arbitrary URL parameter, pointing to that site, on one of my sites. Fast forward a couple of weeks, and now that other site's search results are full of that parameter. It seems to me that this is a bug, and a potentially serious one, since it could be used to hurt search ranking or do even worse things.

Because of this, I'm re-opening this ticket.

Note that this problem doesn't seem to affect all Wordpress sites. It seems to depend on the theme. I've found that the included 2012, 2013, etc. themes are all affected.

#3 @Denis-de-Bernardy
9 years ago

Confirmed. It wouldn't matter for Google if non-singular pages would output a canonical url in page source code -- but that is evidently not the case.

#4 @SergeyBiryukov
9 years ago

  • Milestone set to Awaiting Review

#5 follow-up: @Another Guy
9 years ago

I am seeing more than more that this sort of thing is an attempt to create false / fake links or duplicate content on wordpress sites. This sort of error "echos" through the site, as once Google indexes one page with a nonsense string, that same string gets carried over as it goes through the entire site, creating duplicate content issues galore.

It perhaps touches the issues of WP Core needing to have some handling for canonical directives so that this sort of thing is not such an issue.

#6 in reply to: ↑ 5 @jrivett
9 years ago

Replying to Another Guy:

It perhaps touches the issues of WP Core needing to have some handling for canonical directives so that this sort of thing is not such an issue.

I have a feeling that this might be a core issue, but on the other hand if it was a core issue, we should see it happening on all Wordpress sites, and I only see it with certain themes. Anyway, thanks for chiming in. Given some of the responses I got on the wordpress IRC channel, I wasn't sure I would get much traction here.

#7 @DrewAPicture
9 years ago

  • Milestone Awaiting Review deleted
  • Resolution set to invalid
  • Status changed from reopened to closed

I've tried to reproduce this behavior on quite a few different sites and setups without success, including ones running the older default themes. This seems like a server configuration problem or themes persisting the query string values for whatever reason. Closing.

#8 @jrivett
9 years ago

  • Resolution invalid deleted
  • Status changed from closed to reopened

I can demonstrate this behaviour on six different sites, which are hosted on four different servers, and which use several different themes. I can provide URLs if necessary.

I've reopened this ticket because it still looks like a real problem to me, and I'm not the only person seeing it. Note that the problem persists in Wordpress 4.3.

Is there some other information I can provide, such as logs, server configuration, or other diagnostics, that might help to track down the source of this problem?

I've tried disabling all plugins that might affect URL handling, but that made no difference. I'm willing to consider the possibility that it's a server configuration issue, but it would be helpful to have some idea what to look for.

#9 follow-up: @dd32
9 years ago

  • Keywords reporter-feedback added
  • Milestone set to Awaiting Review

This really sounds like themes using bad pagination functions, or a odd server configuration which isn't being accounted for somewhere.

@jrivett can you duplicate this on a site using a default WordPress theme (One of the Twenty series, ie. Twentyten)?
Are the themes custom, or from the WordPress.org theme directory?

#10 in reply to: ↑ 9 @jrivett
9 years ago

Replying to dd32:

@jrivett can you duplicate this on a site using a default WordPress theme (One of the Twenty series, ie. Twentyten)?

I just changed the development version of one of my sites from a child of twenty-fourteen to the default twenty-fourteen theme, and the problem persists.

Are the themes custom, or from the WordPress.org theme directory?

One of the sites is the UPS blog (blog.ups.com), so I don't know what theme it's using. Of the rest, one uses a twenty-fourteen child theme, two use twenty-fifteen child themes, one uses a highly-customized copy of the twenty-ten theme, and one uses the Responsive theme (from the Wordpress.org theme directory).

Note that the sites using twenty-fourteen and twenty-fifteen child themes were until recently using child themes of earlier twenty- themes, and the problem existed there as well. I mention this because that was when I first added my comments to this ticket.

Note: See TracTickets for help on using tickets.