Opened 16 months ago
Last modified 16 months ago
#19926 new defect (bug)
Bad special characters replacement when changing from HTML to Visual
| Reported by: |
|
Owned by: | |
|---|---|---|---|
| Priority: | normal | Milestone: | Awaiting Review |
| Component: | TinyMCE | Version: | |
| Severity: | normal | Keywords: | |
| Cc: | raven@… |
Description
Hello there. I have found a problem in some characters when changing from HTML to Visual.
If you use ↓ in HTML view to have a down arrow, when you change to Visual you can see the down arrow. It's not interpreted, it is replaced for the ↓ character. When you hit publish, the ↓ transforms into ?.
Same problem if I use the special chars button in Visual view. They transform into ?.
It happens for all the ASCII special characters. Here is a list of the working/not working special characters in UTF-8.
Doesn't work (changes into ?)
′ ≤ ∞ ♣ ♦ ♥ ♠ ↔ ← ↑ → ↓ ″ ≥ ∝ ∂ ≠ ≡ ≊ ≈
Works (shows correctly):
ƒ ƒ • • … …
I have tested it using WordPress 3.4-alpha-19719 & Twenty Eleven & UTF-8
Possibly related: #17487
Attachments (2)
Change History (5)
SergeyBiryukov — 16 months ago
SergeyBiryukov — 16 months ago
comment:1
follow-up:
↓ 3
SergeyBiryukov — 16 months ago
This actually is a very complex problem. It depends on the browser being set to UTF-8 (this is the default for WP but can be overridden by the user), the current font having all UTF-8 chars, the DB table(s) having the proper utf8 encoding, even the connection between the server and the DB having the proper charset...
All these should be UTF-8 which is usually the case.
Replying to SergeyBiryukov:
My results are different. Only two characters are displayed incorrectly: ♠ and ≠...
In my tests all chars were saved and displayed correctly. Thinking this may have something to do with the initial urlencoding in the browser. Could you have a look at the request header when saving a post, in FF9 the form data is transmitted as application/x-www-form-urlencoded despite that we don't set this explicitly on the form.

My results are different. Only two characters are displayed incorrectly: ♠ and ≠ (see the screenshot, tested in Firefox 9.0.1 and Opera 11.61). UTF-8 representations of these entities contain A0 ASCII character, also known as . Upon saving, it's replaced with 20 (regular space), creating an invalid UTF-8 sequence, similarly to #11528 and #19033.
Only happens in TinyMCE. Saving in HTML mode works properly. For some reason, 19926.0.patch doesn't seem to fix anything.