#14347 closed defect (bug) (wontfix)
URLs are not handeled properly
Reported by: | hakre | Owned by: | |
---|---|---|---|
Milestone: | Priority: | normal | |
Severity: | normal | Version: | |
Component: | General | Keywords: | |
Focuses: | Cc: |
Description
While digging into #14201, #14292 and similars, it came to my attention, that wordpress does not filter the URL input properly. This can lead to 404 responses where content is actually available as specified by http / RFC 2612.
Example run against current trunk to illustrate the issue:
# curl -I http://webroot.loc/wordpress/tag/%e4%b8%80%e6%a0%b7 HTTP/1.1 200 OK Date: Sun, 18 Jul 2010 18:53:02 GMT Server: Apache X-Pingback: http://webroot.loc/wordpress/xmlrpc.php Content-Type: text/html; charset=UTF-8
Doing the same request with an alternative writing in the URL does lead to a 404. Remind that the "a" of tag has been encoded as %41:
# curl -I http://webroot.loc/wordpress/t%41g/%e4%b8%80%e6%a0%b7 HTTP/1.1 404 Not Found Date: Sun, 18 Jul 2010 18:54:32 GMT Server: Apache Cache-Control: no-cache, must-revalidate, max-age=0 Expires: Wed, 11 Jan 1984 05:00:00 GMT Pragma: no-cache X-Pingback: http://webroot.loc/wordpress/xmlrpc.php Last-Modified: Sun, 18 Jul 2010 18:54:33 GMT Content-Type: text/html; charset=UTF-8
RFC 2613 clearly write about this in the comparison of URLs (3.2.3):
Characters other than those in the "reserved" and "unsafe" sets (see
RFC 2396 [42]) are equivalent to their ""%" HEX HEX" encoding.
These so called character triplets are written uppercase by the PHP urlencode() and rawurlencode() functions, are written lowercase mostly inside worpdress (e.g. slugs generation). They can be written either and even mixed case, even the RFCs introduce them uppercase first. But both variants are okay, even %dD
is.
The webapplication should handle both URLs the same.
Change History (8)
#3
@
14 years ago
In 14292.3.patch you can find a url_normalize()
function. It could be used to remove noise from input via the URL.
Defect is that the order of queryinfo key/value pairs is not taken care of.
#4
@
14 years ago
Removed the defect to not normalize the queryinfo part of an URL: 14292.4.patch
#5
@
14 years ago
Found another defect regarding the range of invalid chars, US-ASCII is 7bit, so upper-half of 8-bit encodings are invalid.
#6
@
14 years ago
- Resolution set to wontfix
- Status changed from new to closed
Normalize URLs plugin fixes this issue.
If some developer wants to give this some traction, feel free to reopen. With this plugin, worksforme now.
Ref: #1420