Opened 14 years ago
Closed 14 years ago
#14292 closed defect (bug) (fixed)
loop in tags url to same url
Reported by: | gilrabbi2 | Owned by: | |
---|---|---|---|
Milestone: | 3.0.1 | Priority: | normal |
Severity: | critical | Version: | 3.0 |
Component: | General | Keywords: | reporter-feedback |
Focuses: | Cc: |
Description
all the blog tags url with hebrew character are loop 301 to the same url.
sample:
http://www.site.com/tag/xxxx
loop to http://www.site.com/tag/xxxx
or
http://ranh.co.il/tag/וורדפרס-בעברית/
loop to http://ranh.co.il/tag/וורדפרס-בעברית/
Status: HTTP/1.1 301 Moved Permanently
google webmaster tools show all tags with errors loop redirect.
this bug also been in all wordpress with hebrew character.
Attachments (8)
Change History (37)
#3
@
14 years ago
so how can i fix that ? my category url is ok, only the tags is with this bug.
when i back to wordpress 2.9 all the url tags back to normal.
its happend only in wp3.
#5
follow-up:
↓ 7
@
14 years ago
The patch in #14201 seems does not work, Google webmaster tools also report crawl errors.
Hope wordpress could pay more attention to non-ASCII language users.
#8
@
14 years ago
Simplified patch. fixes redirects with mixed-cased character triplets as well which should be transparent according to RFC.
#9
@
14 years ago
- Keywords reporter-feedback added; tags.tag.tags url.tag url loop tag removed
The Better HTTP Redirect Plugin version 1.2-beta-2 is a proof of concept on that approach. Just install it and the redirect should be gone.
Please check if patch or plugin fixes your issue.
#10
@
14 years ago
http://www.ietf.org/rfc/rfc2616.txt
See section 3.2.3
When comparing two URIs to decide if they match or not, a client SHOULD use a case-sensitive octet-by-octet comparison of the entire URIs, with these exceptions: - A port that is empty or not given is equivalent to the default port for that URI-reference; - Comparisons of host names MUST be case-insensitive; - Comparisons of scheme names MUST be case-insensitive; - An empty abs_path is equivalent to an abs_path of "/". Characters other than those in the "reserved" and "unsafe" sets (see RFC 2396 [42]) are equivalent to their ""%" HEX HEX" encoding. For example, the following three URIs are equivalent: http://abc.com:80/~smith/home.html http://ABC.com/%7Esmith/home.html http://ABC.com:/%7esmith/home.html
#11
follow-up:
↓ 14
@
14 years ago
It seems that the hex encoding is the only part that should be case-insensitive.
#14
in reply to:
↑ 11
@
14 years ago
Replying to ryan:
It seems that the hex encoding is the only part that should be case-insensitive.
As for strictness, yes. What's the reason you place it after the first check (line 344)?
Can confirm that the regex patch works as well.
#18
@
14 years ago
I ran over some related issue after giving the RFC documentation about URL comparison some grip: #14347
I have something as a patch, but need to test this against your latest changes. It normalizes an URL. As in #14347, this could be useful overall in core, mabye directly by normalizing the $_SERVER['REQUEST_URI']
.
FYI: Those %[a-zA-Z0-9]{2}
are called character triplets btw, octet is an 8-bit entitiy.
#19
@
14 years ago
Slightly extended scenario. I added a new tag called a-一样 on now the very latest trunk with the recent changesets of this ticket. So let's test:
# curl -I http://webroot.loc/wordpress/tag/a-%E4%B8%80%E6%A0%B7 HTTP/1.1 200 OK Date: Sun, 18 Jul 2010 19:24:28 GMT Server: Apache X-Pingback: http://webroot.loc/wordpress/xmlrpc.php Content-Type: text/html; charset=UTF-8
Now the same requet with a similar but different encoded URL:
# curl -I http://webroot.loc/wordpress/tag/%41-%E4%B8%80%E6%A0%B7 HTTP/1.1 301 Moved Permanently Date: Sun, 18 Jul 2010 19:25:32 GMT Server: Apache X-Pingback: http://webroot.loc/wordpress/xmlrpc.php Location: http://webroot.loc/wordpress/tag/a-%e4%b8%80%e6%a0%b7 Content-Type: text/html; charset=UTF-8
301 is there again.
#21
@
14 years ago
Patch introduces url_normalize() which creates something that could be named a "wordpress-way" normalized url. First of all it normalizes an URL so that only those chars are encoded that need to be encoded and all the other stuff mentioned in section 3.2.x of RFC 2613 is compacted into "the one comparable" representation.
This is following the HTTP standard. The "wordpress-way" part of it is to use lowercase triplets. Both PHP and the RFC suggest uppercase as the default. I like lowercase as well, and it's compatible.
Defect: Arguments inside the queryinfo part of the URL are not alphabeitcally sorted. That's something which could be aditionally done.
This patch comes with another function called url_compare() as well which I had used prior to normalize in the URL in the entry point and left it just as a usage exmaple.
#22
@
14 years ago
Here an example request with the patch applied:
# curl -I http://webroot.loc/wordpress/tag/%61-%E4%B8%80%E6%A0%B7 HTTP/1.1 200 OK Date: Sun, 18 Jul 2010 23:40:09 GMT Server: Apache X-Pingback: http://webroot.loc/wordpress/xmlrpc.php Content-Type: text/html; charset=UTF-8
no more 301 any longer.
#23
@
14 years ago
for the log: Path could be normalized as well:
# http://webroot.loc/wordpress/tag/../tag/%61-%E4%B8%80%E6%A0%B7 HTTP/1.1 404 Not Found Date: Mon, 19 Jul 2010 10:25:59 GMT Server: Apache Cache-Control: no-cache, must-revalidate, max-age=0 Expires: Wed, 11 Jan 1984 05:00:00 GMT Pragma: no-cache X-Pingback: http://webroot.loc/wordpress/xmlrpc.php Last-Modified: Mon, 19 Jul 2010 10:26:04 GMT Content-Type: text/html; charset=UTF-8
#26
@
14 years ago
For the log: normalizing the path might not be a bad idea. Needs to be properly checked against RFCs first.
#27
@
14 years ago
Ref: Normalize URLs plugin fixes this issue by playing the standards better as in the changes in [15437] / [15438] / [15444].
#28
@
14 years ago
- Resolution fixed deleted
- Status changed from closed to reopened
Added the removal of dot segments according to Path Segment Normalization (RFC 3986 6.2.2.3.) in the latest development version of the Normalize URLs Plugin
Example request with dot segments and percent-encoded unreserved characters:
# curl -I http://webroot.loc/wordpress/t%61g/././../tag/%61pple HTTP/1.1 200 OK Date: Wed, 21 Jul 2010 22:52:51 GMT Server: Apache Content-Type: text/html
Perhaps related to #14201?