Make WordPress Core

Opened 9 years ago

Closed 6 years ago

#19926 closed defect (bug) (wontfix)

Bad special characters replacement when changing from HTML to Visual

Reported by: bi0xid Owned by:
Milestone: Priority: normal
Severity: normal Version:
Component: TinyMCE Keywords:
Focuses: Cc:


Hello there. I have found a problem in some characters when changing from HTML to Visual.

If you use ↓ in HTML view to have a down arrow, when you change to Visual you can see the down arrow. It's not interpreted, it is replaced for the character. When you hit publish, the ↓ transforms into ?.
Same problem if I use the special chars button in Visual view. They transform into ?.

It happens for all the ASCII special characters. Here is a list of the working/not working special characters in UTF-8.

Doesn't work (changes into ?)


Works (shows correctly):

ƒ     ƒ
•       •
…  …

I have tested it using WordPress 3.4-alpha-19719 & Twenty Eleven & UTF-8

Possibly related: #17487

Attachments (2)

19926.converted-entities.png (4.9 KB) - added by SergeyBiryukov 9 years ago.
19926.0.patch (578 bytes) - added by SergeyBiryukov 9 years ago.

Download all attachments as: .zip

Change History (9)

#1 follow-up: @SergeyBiryukov
9 years ago

My results are different. Only two characters are displayed incorrectly: ♠ and ≠ (see the screenshot, tested in Firefox 9.0.1 and Opera 11.61). UTF-8 representations of these entities contain A0 ASCII character, also known as  . Upon saving, it's replaced with 20 (regular space), creating an invalid UTF-8 sequence, similarly to #11528 and #19033.

Only happens in TinyMCE. Saving in HTML mode works properly. For some reason, 19926.0.patch doesn't seem to fix anything.

#2 @bi0xid
9 years ago

  • Cc raven@… added

#3 in reply to: ↑ 1 @azaozz
9 years ago

This actually is a very complex problem. It depends on the browser being set to UTF-8 (this is the default for WP but can be overridden by the user), the current font having all UTF-8 chars, the DB table(s) having the proper utf8 encoding, even the connection between the server and the DB having the proper charset...

All these should be UTF-8 which is usually the case.

Replying to SergeyBiryukov:

My results are different. Only two characters are displayed incorrectly: ♠ and ≠...

In my tests all chars were saved and displayed correctly. Thinking this may have something to do with the initial urlencoding in the browser. Could you have a look at the request header when saving a post, in FF9 the form data is transmitted as application/x-www-form-urlencoded despite that we don't set this explicitly on the form.

#4 @c3mdigital
8 years ago

  • Resolution set to invalid
  • Status changed from new to closed

I'm able to display all characters correctly in latest version of Chrome set to UTF-8. If anyone feels this is still an issue please re open.

#5 @johnbillion
8 years ago

  • Resolution invalid deleted
  • Status changed from closed to reopened

Testing only in the latest version of Chrome is not sufficient to close a ticket.

#6 @WraithKenny
8 years ago

  • Cc Ken@… added

Might be worth pointing out that the font could impact this (the display of <?> at least). For example, the normal cascade might serve different fonts to users depending on system installed fonts or loaded webfonts. Different fonts may have different missing glyphs.

@SergeyBiryukov I don't think the patch had any effect for those chars, since (I think http://www.tinymce.com/wiki.php/Configuration:entity_encoding ) the "named" setting will only convert what is white-listed in 'entities'. Removing the 'entities' setting (to use the more complete default set http://www.tinymce.com/wiki.php/Configuration3x:entities ) might be good.

#7 @chriscct7
6 years ago

  • Milestone Awaiting Review deleted
  • Resolution set to wontfix
  • Status changed from reopened to closed

0 reports in over 4 years. Closing as worksforme

Note: See TracTickets for help on using tickets.