WordPress.org

Make WordPress Core

Opened 22 months ago

Last modified 2 months ago

#21113 new enhancement

Previous/Next page links maintain all GET variables

Reported by: kirrus Owned by:
Milestone: Awaiting Review Priority: normal
Severity: normal Version: 3.4
Component: General Keywords: needs-patch 2nd-opinion close
Focuses: Cc:

Description

The newer/older entries pagination system takes any query string in a inbound request, and includes it in the links generated for the newer/older entries.

This causes problems when you put wordpress behind a cache, because all it takes is some bot trying a joomla hack to mean all visitors suddenly have a version of that page, including the bad query string, very visible.

For example:
http://kirrus.co.uk/page/6/?test=true

Note, in the 'Newer/Older' links at the bottom of the page, that "test=true" will be retained.

These should only really keep query-strings that wordpress knows it'll need, if you're including them? Else, you can basically poison someone's cache with this.

An example of the really bad query string poisoning a cache:
/page/2/?option=com_gk3_tabs_manager&controller=..%2F..%2F..%2F..%2F..%2F..%2F..%2F..%2F..%2F..%2F..%2F..%2F..%2F..%2F..%2F..%2F..%2F..%2F..%2F..%2F..%2F..%2F..%2F..%2Fproc%2Fself%2Fenviron%0000

Change History (13)

comment:1 kirrus22 months ago

  • Summary changed from Pagination puts random query strings in generated HTML to Previous/Next page links maintain all GET variables

comment:2 SergeyBiryukov22 months ago

These should only really keep query-strings that wordpress knows it'll need

The problem is that there's no way to tell in advance which query vars will be needed. If a plugin uses custom $_GET vars, an attempt to remove any non-default query parameters would break that.

It should be possible to remove them in this particular case by hooking to get_pagenum_link filter.

comment:3 johnbillion22 months ago

If you're using static page caching then there is a configuration issue with your cache.

If a bot hits a page with those funky query vars and the page gets cached, then subsequent visitors to the "same" page without those query vars should not be served the cached page from the request where they were present. It should be treated as a different page, because the URI is different.

comment:4 SergeyBiryukov17 months ago

#22530 was marked as a duplicate.

comment:5 rawalex17 months ago

  • Severity changed from minor to critical

I had a ticket on this closed and marked as duplicate. I don't think of this as only a defect waiting review, rather it's a critical bug because it provides a potential vector for an attack. I haven't tested it, but the potential here is for overflow errors, or using this "unchecked" input system as a way to use other security issues to hack wordpress.

Quite simply, this is dangerous to leave open, because it has potential to be used in bad ways when combined with other problems or issues.

Further, as Googlebot is now penalizing duplicate content heavily, this has become a vector by which malicious people can attack your site. They create a number of links with garbage query strings, post them on forums and blog comments around the net, and suddenly Googlebot is spotting all sorts of duplicate content on your site - and penalizes your entire site accordinly.

So this goes for me from being just a bug to a major problem that lowers the value of WP as a CMS under the current circumstances.

comment:6 rawalex17 months ago

  • Keywords needs-patch added

comment:7 crazycoders13 months ago

  • Cc crazycoders added
  • Keywords 2nd-opinion added
  • Severity changed from critical to normal
  • Type changed from defect (bug) to enhancement

I suggest to only keep variables that wordpress requires to work with but add a filter if there isn't one yet so the user/plugins can supplement more parameters if need be. Always remember that someone might actually need that query parameter ABC that was originaly passed on.

I'll check the core this weekend to see how i can submit a patch regarding this.

Note that a modification of this nature will break existing plugins and themes that rely on that buggy feature and thus will probably not be merged before a few cycles until everyone gets notified about it.

I might actually suggest two patches, one to add the filter first (If it doesn't exist yet) which will do no harm but offer an immediate response to all developpers and then in a later cycle, we can introduce the fix that will break plugins and themes that still rely on it and haven't changed their methods to the new one.

comment:8 johnbillion13 months ago

  • Keywords close added

Personally I don't believe this is an issue. The original poster's static page caching (or other caching layer) must be mis-configured somehow, resulting in the cache being poisoned with these URLs.

In a normal situation, a paginated link will require all the current $_GET variables to perform as expected. If we start stripping variables from pagination links then things will break unnecessarily.

comment:9 follow-up: kirrus13 months ago

The reason the cache was poisoned was an interaction with the wp-SuperCache module, that was generating static pages with the poisoned urls in. These were then served to all users. I've turned that particular feature off in Supercache. I'm also doing this in the cache, which is really quite aggressive, but also handily effective in stopping the most egregious GET variables:

        if (!req.url ~ "_wp_http_referer") {
                if (req.url ~ "(\.\..+)+" || req.url ~ "\.\.%2F.+$" || req.url ~ "self%2Fenviron" ) {
                        set req.url = regsub(req.url, "\?.*$", "");
                }
        }

In any case, this is still a bug; it is a vector that may be used to abuse other bugs in wordpress to attack wordpress, it is a vector to attack any wp-SuperCache using site, and a vector to cause sites to show thousands of unique urls showing duplicate content to Google. Anything that allows a user to directly alter website content in this fashion should be treated with suspicion.

Version 1, edited 13 months ago by kirrus (previous) (next) (diff)

comment:10 kirrus13 months ago

I should add, a clever attacker could also use this particular bug to fill a reverse proxy caching system's store with many many thousands of copies of the same data, hurting the cache's ability to take load off the backend system. It could easily be used to make a simple HTTP requesting DOS/DDoS more painful / harder to handle.

comment:11 in reply to: ↑ 9 johnbillion13 months ago

Replying to kirrus:

The reason the cache was poisoned was an interaction with the wp-SuperCache module, that was generating static pages with the poisoned urls in. These were then served to all users. I've turned that particular feature off in Supercache.

Could you tell me which feature in WP Super Cache this was? I'd like to find out which module/setting/feature is causing this.

If a page at a poisoned URL is generated and stored in the page cache, the poisoned pagination URL should never be shown to a user visiting a non-poisoned URL.

Replying to kirrus:

I should add, a clever attacker could also use this particular bug to fill a reverse proxy caching system's store with many many thousands of copies of the same data

This is true of any server that's caching URLs containing GET parameters. I could slam a site with requests for example.com/?foo=1, example.com/?foo=2, example.com/?foo=99999 and achieve the same effect. The paginated pages for these URLs aren't generated unless the paginated URLs are requested, so the cache fills up no quicker.

comment:12 kirrus5 months ago

For some reason, I didn't get an email about this, sorry.

I think in supercache, it's the following option:

  • Don’t cache pages with GET parameters. (?x=y at the end of a url)

This of course may be moot now, with the code in varnish, I've not tested if wordpress + supercache + varnish can still poisons cache with referrers.

comment:13 kirrus2 months ago

Oh, I also forgot, the main problem we were having was that this was leaking into the links, and scaring users, as their address bar suddenly contained useless (but scary looking) attack-type urls.

Saying that, is there much point leaving this bug here, should it be closed as wontfix or invalid?

Note: See TracTickets for help on using tickets.