Make WordPress Core

Opened 14 years ago

Closed 14 years ago

Last modified 12 years ago

#14347 closed defect (bug) (wontfix)

URLs are not handeled properly

Reported by: hakre's profile hakre Owned by:
Milestone: Priority: normal
Severity: normal Version:
Component: General Keywords:
Focuses: Cc:

Description

While digging into #14201, #14292 and similars, it came to my attention, that wordpress does not filter the URL input properly. This can lead to 404 responses where content is actually available as specified by http / RFC 2612.

Example run against current trunk to illustrate the issue:

# curl -I http://webroot.loc/wordpress/tag/%e4%b8%80%e6%a0%b7

HTTP/1.1 200 OK
Date: Sun, 18 Jul 2010 18:53:02 GMT
Server: Apache
X-Pingback: http://webroot.loc/wordpress/xmlrpc.php
Content-Type: text/html; charset=UTF-8

Doing the same request with an alternative writing in the URL does lead to a 404. Remind that the "a" of tag has been encoded as %41:

# curl -I http://webroot.loc/wordpress/t%41g/%e4%b8%80%e6%a0%b7
HTTP/1.1 404 Not Found
Date: Sun, 18 Jul 2010 18:54:32 GMT
Server: Apache
Cache-Control: no-cache, must-revalidate, max-age=0
Expires: Wed, 11 Jan 1984 05:00:00 GMT
Pragma: no-cache
X-Pingback: http://webroot.loc/wordpress/xmlrpc.php
Last-Modified: Sun, 18 Jul 2010 18:54:33 GMT
Content-Type: text/html; charset=UTF-8

RFC 2613 clearly write about this in the comparison of URLs (3.2.3):

Characters other than those in the "reserved" and "unsafe" sets (see

RFC 2396 [42]) are equivalent to their ""%" HEX HEX" encoding.

These so called character triplets are written uppercase by the PHP urlencode() and rawurlencode() functions, are written lowercase mostly inside worpdress (e.g. slugs generation). They can be written either and even mixed case, even the RFCs introduce them uppercase first. But both variants are okay, even %dD is.

The webapplication should handle both URLs the same.

Change History (8)

#3 @hakre
14 years ago

In 14292.3.patch you can find a url_normalize() function. It could be used to remove noise from input via the URL.

Defect is that the order of queryinfo key/value pairs is not taken care of.

#4 @hakre
14 years ago

Removed the defect to not normalize the queryinfo part of an URL: 14292.4.patch

#5 @hakre
14 years ago

Found another defect regarding the range of invalid chars, US-ASCII is 7bit, so upper-half of 8-bit encodings are invalid.

#6 @hakre
14 years ago

  • Resolution set to wontfix
  • Status changed from new to closed

Normalize URLs plugin fixes this issue.

If some developer wants to give this some traction, feel free to reopen. With this plugin, worksforme now.

#7 @nacin
14 years ago

  • Milestone Awaiting Review deleted

#8 @tszming
12 years ago

Even the plugin solved the issue, why not leave this ticket open until the fix is merged into the core?

Note: See TracTickets for help on using tickets.