WordPress.org

Make WordPress Core

Opened 2 years ago

Last modified 8 months ago

#19926 reopened defect (bug)

Bad special characters replacement when changing from HTML to Visual

Reported by: bi0xid Owned by:
Milestone: Awaiting Review Priority: normal
Severity: normal Version:
Component: TinyMCE Keywords:
Focuses: Cc:

Description

Hello there. I have found a problem in some characters when changing from HTML to Visual.

If you use ↓ in HTML view to have a down arrow, when you change to Visual you can see the down arrow. It's not interpreted, it is replaced for the character. When you hit publish, the ↓ transforms into ?.
Same problem if I use the special chars button in Visual view. They transform into ?.

It happens for all the ASCII special characters. Here is a list of the working/not working special characters in UTF-8.

Doesn't work (changes into ?)

′    
≤       
∞    
♣    
♦    
♥   
♠   
↔     
←     
↑     
→     
↓     
″    
≥       
∝     
∂     
≠     
≡    
≊      
≈  

Works (shows correctly):

ƒ     ƒ
•       •
…  …

I have tested it using WordPress 3.4-alpha-19719 & Twenty Eleven & UTF-8

Possibly related: #17487

Attachments (2)

19926.converted-entities.png (4.9 KB) - added by SergeyBiryukov 2 years ago.
19926.0.patch (578 bytes) - added by SergeyBiryukov 2 years ago.

Download all attachments as: .zip

Change History (8)

SergeyBiryukov2 years ago

comment:1 follow-up: SergeyBiryukov2 years ago

My results are different. Only two characters are displayed incorrectly: ♠ and ≠ (see the screenshot, tested in Firefox 9.0.1 and Opera 11.61). UTF-8 representations of these entities contain A0 ASCII character, also known as  . Upon saving, it's replaced with 20 (regular space), creating an invalid UTF-8 sequence, similarly to #11528 and #19033.

Only happens in TinyMCE. Saving in HTML mode works properly. For some reason, 19926.0.patch doesn't seem to fix anything.

comment:2 bi0xid2 years ago

  • Cc raven@… added

comment:3 in reply to: ↑ 1 azaozz2 years ago

This actually is a very complex problem. It depends on the browser being set to UTF-8 (this is the default for WP but can be overridden by the user), the current font having all UTF-8 chars, the DB table(s) having the proper utf8 encoding, even the connection between the server and the DB having the proper charset...

All these should be UTF-8 which is usually the case.

Replying to SergeyBiryukov:

My results are different. Only two characters are displayed incorrectly: ♠ and ≠...

In my tests all chars were saved and displayed correctly. Thinking this may have something to do with the initial urlencoding in the browser. Could you have a look at the request header when saving a post, in FF9 the form data is transmitted as application/x-www-form-urlencoded despite that we don't set this explicitly on the form.

comment:4 c3mdigital8 months ago

  • Resolution set to invalid
  • Status changed from new to closed

I'm able to display all characters correctly in latest version of Chrome set to UTF-8. If anyone feels this is still an issue please re open.

comment:5 johnbillion8 months ago

  • Resolution invalid deleted
  • Status changed from closed to reopened

Testing only in the latest version of Chrome is not sufficient to close a ticket.

comment:6 WraithKenny8 months ago

  • Cc Ken@… added

Might be worth pointing out that the font could impact this (the display of <?> at least). For example, the normal cascade might serve different fonts to users depending on system installed fonts or loaded webfonts. Different fonts may have different missing glyphs.

@SergeyBiryukov I don't think the patch had any effect for those chars, since (I think http://www.tinymce.com/wiki.php/Configuration:entity_encoding ) the "named" setting will only convert what is white-listed in 'entities'. Removing the 'entities' setting (to use the more complete default set http://www.tinymce.com/wiki.php/Configuration3x:entities ) might be good.

Note: See TracTickets for help on using tickets.