Make WordPress Core

Opened 13 years ago

Closed 12 years ago

#18945 closed defect (bug) (invalid)

bad url character encoding in Arabic post names and categories

Reported by: walid3's profile walid3 Owned by:
Milestone: Priority: normal
Severity: normal Version: 3.3
Component: General Keywords: reporter-feedback
Focuses: Cc:


Arabic post names and categories show in signs like this:


beside the bad look, It also give error when collecting sitemap of blog:
XML Parsing Error: undefined entity

when you share the post in Facebook or so the link look the same with strange signs.

Change History (5)

#1 @nacin
13 years ago

Sitemap, or feed?

#2 @nacin
13 years ago

  • Keywords reporter-feedback added

#3 follow-up: @walid3
13 years ago

well, the link have this strange signs almost everywhere, hovering over the links, copying it to clipboard, also in sitemap and as I said, that is the bad point.

#4 in reply to: ↑ 3 @SergeyBiryukov
13 years ago

Replying to walid3:

well, the link have this strange signs almost everywhere

Encoding UTF-8 characters is a part of RFC 3986:

Non-ASCII characters must first be encoded according to UTF-8 [STD63], and then each octet of the corresponding UTF-8 sequence must be percent-encoded to be represented as URI characters.

It's the same for Cyrillic characters, for example. I don't think we can do anything here.

That said, most browsers decode the URLs to display them in a human-readable form:

Firefox 8.0, Chrome 15, Opera 11.52, Safari 5.1 show unencoded URLs.
IE 8, IE 9 show encoded URLs.

See #16496 for making $sample_permalink_html human-readable.

I've also checked comment feeds for posts with UTF-8 slugs, and they seem to work correctly.

#5 @SergeyBiryukov
12 years ago

  • Milestone Awaiting Review deleted
  • Resolution set to invalid
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.