Opened 13 years ago
Closed 12 years ago
#18945 closed defect (bug) (invalid)
bad url character encoding in Arabic post names and categories
Reported by: | walid3 | Owned by: | |
---|---|---|---|
Milestone: | Priority: | normal | |
Severity: | normal | Version: | 3.3 |
Component: | General | Keywords: | reporter-feedback |
Focuses: | Cc: |
Description
Arabic post names and categories show in signs like this:
%d9%88%d8%b1%d8%af%d8%a9-%d8%a3%d9%85%d8%a7%d9%85-%d8%a8%d8%a7%d8%a8-%d9%85%d8%ba%d9%84%d9%82
beside the bad look, It also give error when collecting sitemap of blog:
XML Parsing Error: undefined entity
when you share the post in Facebook or so the link look the same with strange signs.
Change History (5)
#3
follow-up:
↓ 4
@
13 years ago
well, the link have this strange signs almost everywhere, hovering over the links, copying it to clipboard, also in sitemap and as I said, that is the bad point.
#4
in reply to:
↑ 3
@
13 years ago
Replying to walid3:
well, the link have this strange signs almost everywhere
Encoding UTF-8 characters is a part of RFC 3986:
Non-ASCII characters must first be encoded according to UTF-8 [STD63], and then each octet of the corresponding UTF-8 sequence must be percent-encoded to be represented as URI characters.
http://tools.ietf.org/html/rfc3986#page-21
http://en.wikipedia.org/wiki/Percent-encoding#Current_standard
It's the same for Cyrillic characters, for example. I don't think we can do anything here.
That said, most browsers decode the URLs to display them in a human-readable form:
Firefox 8.0, Chrome 15, Opera 11.52, Safari 5.1 show unencoded URLs.
IE 8, IE 9 show encoded URLs.
See #16496 for making $sample_permalink_html
human-readable.
I've also checked comment feeds for posts with UTF-8 slugs, and they seem to work correctly.
Sitemap, or feed?