Make WordPress Core

Opened 7 years ago

Last modified 4 years ago

#41712 new defect (bug)

canonical infinite redirect bug

Reported by: yprince's profile yprince Owned by:
Milestone: Awaiting Review Priority: normal
Severity: normal Version:
Component: Canonical Keywords: has-patch
Focuses: Cc:

Description

I had an infinite redirection problem caused by "redirect_canonical" (wp-includes\canonical.php)

I have a query string parameter ex: "http://mysite.com/mypage/?q=test%2C+test" and it was redirected infinitly to itself because of:

<?php
// yes, again -- in case the filter aborted the request
        if ( ! $redirect_url || strip_fragment_from_url( $redirect_url ) == strip_fragment_from_url( $requested_url ) ) {
                return;
        }

my $requested_url was in lowercase "http://mysite.com/mypage/?q=test%2c+test" because of:

<?php
$requested_url = preg_replace_callback('|%[a-fA-F0-9][a-fA-F0-9]|', 'lowercase_octets', $requested_url);

but my $redirect_url was "http://mysite.com/mypage/?q=test%2C+test"

so the both url was different because of the "%2c" vs "%2C"

I solved it by adding:

<?php
add_filter( 'redirect_canonical', 'my_redirect_canonical', 10, 1 );
function my_redirect_canonical($redirect_url){
    $redirect_url = preg_replace_callback('|%[a-fA-F0-9][a-fA-F0-9]|', 'lowercase_octets', $redirect_url);

    return $redirect_url;
}

Change History (3)

#1 @markparnell
4 years ago

  • Component changed from General to Canonical

#2 @sanchothefat
4 years ago

Hi, I've hit essentially the same problem while using Batcache for page caching.

In situations where redirect_canonical() finds a $redirect_url (such as when you have a static page set for the home page) and you have a query string then the query string will be removed, processed and rebuilt.

When the query string contains values that are not URL encoded, or have a trailing = such as ?foo=&bar=abc[xyz] a redirect will occur to ?foo&bar=abc%5Bxyz%5D.

This is mostly benign as the end result for the application will be the same however the redirect is unnecessary. It becomes problematic in the case of Batcache because it generates cache keys based on the query parameters passed to it. Because it can cache redirects it does not differentiate between the 2 versions of the query string and will keep returning the cached redirect resulting in and endless loop.

For a real world example of query strings being used outside of the applications' control marketing and adtech tools often append a bunch of query string parameters used for tracking and personalisation, for example: example.org/?utm_campaign=aaa&hsa_content=&gclid=bbbb.

It would be great if redirect_canonical() could compare the query string before and after preprocessing to determine if the URL truly is different.

This ticket was mentioned in PR #865 on WordPress/wordpress-develop by roborourke.


4 years ago
#3

  • Keywords has-patch added

When processing the redirect URL query string and rebuilding it a false positive can be reported because the new query string is URL encoded and will have trailing = operators removed. This means the new redirect URL does not match the requested URL and an unnecessary redirect will occur.

This can cause caching plugins like Batcache to cache the redirect and create and endless redirect loop as the two query strings will yield the same cache key.

Trac ticket: https://core.trac.wordpress.org/ticket/41712

Note: See TracTickets for help on using tickets.