#14313 closed defect (bug) (duplicate)
TAG permalink with chinese (Japanese) character have issue
Reported by: | lafirel | Owned by: | |
---|---|---|---|
Milestone: | Priority: | normal | |
Severity: | major | Version: | 3.0 |
Component: | General | Keywords: | chinese, tag, permalink, crawl |
Focuses: | Cc: |
Description
If a TAG permalink have chinese (Japanese) characters, the permalink of the TAG is like this:"/tag/%E4%B8%80%E6%A0%B7/"
In fact, the "%E4%B8%80%E6%A0%B7" is chinese (Japanese) characters. BUT, when a browser or a search engine spider visit or crawl the tag page /tag/%E4%B8%80%E6%A0%B7/, the wordpress3.0 will auto redirect "/tag/%E4%B8%80%E6%A0%B7/" to "/tag/%e4%b8%80%e6%a0%b7/".(lowercase)
Then the problerm comes, a lot of Chinese and Japanses user report that when upgrade to Wordpress 3.0 the Google spider can NOT crawl there TAG page smoothly, the spider do not accept the 301, you can check the crawl errors and server log to see how the Wordpress give a 301 when hit the TAG url and how Google report errors.
The Wordpress2.71 or 2.92 user does not have this problem. But Wordpress3.0 user will see a lot of redirect error in Google webmaster tool.
Attachments (2)
Change History (25)
#1
follow-ups:
↓ 2
↓ 3
@
14 years ago
- Milestone Awaiting Review deleted
- Resolution set to duplicate
- Status changed from new to closed
#4
@
14 years ago
I tried to reproduce this on current trunk but was not able to.
Instead of getting a redirect, I get a 404 Not Found, which is similar to #13413.
#5
@
14 years ago
Slugs are stored lower case into the database (slug field of terms table). That explains why a redirect to the lowercase variant is made. It does not explain the 404.
#6
@
14 years ago
Interestingly WP::public 'matched_query' is string 'tag=%E4%B8%80%E6%A0%B7' which contains uppercase character triplets, even if the request $_SERVERREQUEST_URI? => string '/wordpress-trunk/tag/%e4%b8%80%e6%a0%b7/' has lowercase chars.
This is in wp::main after $this->parse_request first called in wp-blog-header.php.
#9
@
14 years ago
I give a try, but the 13413.2.patch in #13413 does not fixes this issue.
If you hit "/tag/%E4%B8%80%E6%A0%B7", the Server still Response: 301 Moved Permanently to "/tag/%e4%b8%80%e6%a0%b7".
#11
follow-up:
↓ 21
@
14 years ago
Hey, after do some check server header test, here comes some interest:
With no patch to my Wordpress3.0, it means I use the original 3.0.
Here is a tools called Check Server Header http://www.seoconsultants.com/tools/headers/
When I get the header with browsers like IE7 IE6 Firefox Opera, the Server Response is 200 OK.
When I get the header with bots like Googlebot or MSNbot, the Server Response is 301 Moved Permanently .
I am wondering if 3.0 have a estimate to response diffrents header via diffrents browser or search engine spider?
#12
@
14 years ago
The patch in #13413 was not meant to fix this issue here, but I needed to do that fix before being able to reproduce this ticket.
So with that patch, I can reproduce the ticket here.
I've been testing now redirects with curl:
Redirect (uppercase character triplets):
# curl -I http://host/wordpress-trunk/tag/%E4%B8%80%E6%A0%B7/ HTTP/1.1 301 Moved Permanently Date: Sat, 17 Jul 2010 10:25:40 GMT Server: Apache X-Pingback: http://host/wordpress-trunk/xmlrpc.php Location: http://host/wordpress-trunk/tag/%e4%b8%80%e6%a0%b7/ Content-Type: text/html; charset=UTF-8
Redirect (mixed-case character triplets):
# curl -I http://host/wordpress-trunk/tag/%e4%B8%80%E6%A0%B7/ HTTP/1.1 301 Moved Permanently Date: Sat, 17 Jul 2010 10:26:50 GMT Server: Apache X-Pingback: http://host/wordpress-trunk/xmlrpc.php Location: http://host/wordpress-trunk/tag/%e4%b8%80%e6%a0%b7/ Content-Type: text/html; charset=UTF-8
No redirect (lowercase character triplets):
# curl -I http://host/wordpress-trunk/tag/%e4%b8%80%e6%a0%b7/ HTTP/1.1 200 OK Date: Sat, 17 Jul 2010 10:24:31 GMT Server: Apache X-Pingback: http://host/wordpress-trunk/xmlrpc.php Content-Type: text/html; charset=UTF-8
#13
@
14 years ago
For a fix on live-sites I've updated the Better HTTP Redirect Plugin to take care on this issue as well. It's built in since version 1.2-beta-2: Redirect Loop Protection for Better HTTP Redirects Plugin. Just download the development version.
It works comparable like the new patch I've uploaded in the other ticket.
#14
@
14 years ago
- Resolution set to duplicate
- Status changed from reopened to closed
From what I see now, this is a duplicate of #14292
#15
follow-up:
↓ 16
@
14 years ago
- Resolution duplicate deleted
- Status changed from closed to reopened
I download the Version 1.2-beta-2 of the plugin and upload it and then activate it.
Then I got these
Results: http://lafirel.com/tag/%E4%B8%89%E7%BA%A2 HTTP/1.1 301 Moved Permanently Transfer-Encoding: chunked Date: Sat, 17 Jul 2010 14:27:35 GMT Server: LiteSpeed Connection: close X-Powered-By: PHP/5.2.12 Vary: Cookie X-Pingback: http://lafirel.com/xmlrpc.php Content-Type: text/html; charset=UTF-8 Location: http://lafirel.com/tag/%e4%b8%89%e7%ba%a2
Do you test it before you say this plugin can take care on this issue as well?
#16
in reply to:
↑ 15
@
14 years ago
Replying to Lafirel:
Do you test it before you say this plugin can take care on this issue as well?
That's the exact issue, yes. Lowercase and Uppercase URL-encoding. The three cases I posted above. I need to check it a second time, maybe I made a mistake in the plugin by accident.
#17
@
14 years ago
Plugin looks good okay as far as I can look. I need to apply the other patch ( 13413.2.patch ) so that UTF-8 works in tag-slugs for me.
Please try this patch: 14292.2.diff
#18
@
14 years ago
For the record, so you can see my request:
# curl -I http://webroot.loc:80/wordpress/tag/%E4%B8%80%e6%A0%B7 HTTP/1.1 200 OK Date: Sun, 18 Jul 2010 03:33:38 GMT Server: Apache X-Pingback: http://webroot.loc/wordpress/xmlrpc.php Content-Type: text/html; charset=UTF-8
This is with 14292.2.diff applied. Please test against a wordpress trunk version with no plugins enabled.
#19
@
14 years ago
Both 14292.2.diff and 14292.2.patch are checked.
http://lafirel.com/tag/%E5%89%91%E8%B1%AA3 HTTP/1.1 200 OK Transfer-Encoding: chunked Date: Sun, 18 Jul 2010 05:46:37 GMT Server: LiteSpeed Connection: close X-Powered-By: PHP/5.2.12 Vary: Cookie X-Pingback: http://lafirel.com/xmlrpc.php Content-Type: text/html; charset=UTF-8
Both two seems OK! I choose 14292.2.patch.
Can I public this patch on my blog? so that the users have the same issue can find it and fix it.
#20
@
14 years ago
Can I public this patch on my blog? so that the users have the same issue can find it and fix it.
You can do whatever you wish with it.. It may go into a release as-is, or modified.. it may not be the final solution here.
#21
in reply to:
↑ 11
@
14 years ago
Replying to Lafirel:
I am wondering if 3.0 have a estimate to response diffrents header via diffrents browser or search engine spider?
Robot, search, post, preview, trackback and comment popup requests and requests done by an admin user are never redirected.
You reported that requests from robots got a redirect, so this is out of synch but ignoring that for now, the list is when canonical redirects are not done.
Duplicate of #14292 ?