Make WordPress Core

Opened 5 years ago

Closed 5 years ago

#45315 closed defect (bug) (duplicate)

Canonical redirection should compare URLs taking encoding into account

Reported by: straussd's profile straussd Owned by:
Milestone: Priority: normal
Severity: minor Version:
Component: Canonical Keywords:
Focuses: Cc:

Description

We're encountering redirect loops because the URL comparison logic does not take encoding fully into account.

If a layer of the stack (in our case, Varnish) re-encodes query parameters either in the "Location" header of a redirect or in a URL prior to it reaching WordPress, WordPress will issue a redirect to its preferred encoding, causing a loop as it attempts to "fix" the encoding choice that another layer repeatedly makes. Such URL rewrites (with re-encoding) may happen if the proxy is configured to sort incoming query parameters (to improve cache hit rates) or otherwise manipulate them. These are fairly common uses for a reverse proxy.

As noted in the source code, hex encodings are not case-sensitive. The RFC also states that encoding ":" and "@" is optional in query string elements. (PHP always encodes them given the lack of context in rawurlencode.) A proxy may make different choices than WordPress, but it should not cause WordPress to start a redirect loop.

One possible solution is to decode entities before comparing the requested URL with the redirect URL.

For example, in wp-includes/canonical.php around line 490, this change fixes the issue:

<?php
if ( ! $redirect_url || rawurldecode($redirect_url) == rawurldecode($requested_url) ) {
    return;
}

Technically, this change could cause some false positives where URLs match when one contains a reserved character in the same position as an encoded one in the other URL. This is an unlikely problem for canonical URL redirects, but I wanted to acknowledge the limitation.

If such false positives aren't acceptable, then I would suggest comparing the individual query string parts one by one, decoding each for its own comparison.

Presumably, similar bugs exist for encodings of the path, but I haven't investigated.

Change History (2)

#1 @straussd
5 years ago

  • Severity changed from major to minor

I created a workaround that's possible to prepend to wp-config.php or use with php.ini prepend functionality. It reprocesses the query string to produce one with equivalent meaning (after decoding) but meeting the expectations of the canonical URL component.

The re-encoding method in the workaround should also have no false positives (in the sense of causing the canonical redirection logic to not redirect when it should).

Dropping severity to "minor" given the workaround.

Last edited 5 years ago by straussd (previous) (diff)

#2 @pento
5 years ago

  • Milestone Awaiting Review deleted
  • Resolution set to duplicate
  • Status changed from new to closed
  • Version trunk deleted

Thank you for the bug report, @straussd! We're currently tracking this bug over in #44744.

Note: See TracTickets for help on using tickets.