#23688 closed defect (bug) (fixed)
esc_textarea, wp_richedit_pre and wp_htmledit_pre eat post content under PHP 5.4
Reported by: | westi | Owned by: | westi |
---|---|---|---|
Milestone: | 3.6 | Priority: | high |
Severity: | blocker | Version: | 3.6 |
Component: | Formatting | Keywords: | needs-patch |
Focuses: | Cc: |
Description
Because of a change in default behaviour in htmlspecialchars
in PHP5.4 it is possible for these three functions to eat perfectly valid post content and make it impossible to edit existing posts.
Scenario:
- blog_charset is ISO-8859-1
- Post contains some 8bit characters
You try and edit the post and instead of the post content you are presented with a blank editor :(
On the front end the posts display fine.
The underlying cause it this change in htmlspecialchars
"5.4.0 The default value for the encoding parameter was changed to UTF-8."
Because the string is not a valid UTF-8 sequence an empty string is returned :(
Related to #20368
Attachments (5)
Change History (21)
#3
@
12 years ago
In 1243/tests:
#4
@
12 years ago
- Owner set to westi
- Resolution set to fixed
- Status changed from new to closed
In 23685:
#5
@
12 years ago
- Resolution fixed deleted
- Severity changed from major to blocker
- Status changed from closed to reopened
utf-8, UTF8 and utf8 are all semi-valid, according to core, but may crash and burn here. Another charset like UTF-16 could also presumably generate warnings and such.
#6
follow-up:
↓ 7
@
12 years ago
Looks like we will need to normalize get_option( 'blog_charset' )
before using it here and limit it to the supported values.
Another case that fails is when the user pastes content with different encoding into the editor (Visual or Text). Then it can contain invalid code sequences and htmlspecialchars() will return an empty string. Perhaps will need to use ENT_SUBSTITUTE
or ENT_DISALLOWED
in PHP 5.4 to work around this (pending better documentation of how exactly these flags work, see http://php.net/manual/en/function.htmlspecialchars.php).
#7
in reply to:
↑ 6
@
12 years ago
Can I just point out the ENT_DISALLOWED does not appear to work properly in this situation as it still gives a blank string and thus empty editing box.
The ENT_SUBSTITUTE does work albeit with odd output, eg: "You’ll" gets converted to "You�ll" when the text encoding is ISO-8859-1 which is better as an intermediate fix.
#8
@
12 years ago
Per #24121, the instance in edit-form-advanced.php
is also affected:
http://core.trac.wordpress.org/browser/tags/3.5.1/wp-admin/edit-form-advanced.php#L330
#10
follow-up:
↓ 11
@
12 years ago
Thinking we need wp_htmlspecialchars()
where blog_charset
can be normalized, etc.
#11
in reply to:
↑ 10
@
12 years ago
Replying to azaozz:
Thinking we need
wp_htmlspecialchars()
whereblog_charset
can be normalized, etc.
I am wondering whether normalising the blog_charset would maybe be better done within get_option() rather than a specific new function?
Reasoning being there are 4x PHP 5.4 functions where this change to UTF-8 was made:
htmlspecialchars()
htmlentities()
html_entity_decode()
get_html_translation_table() [not sure this is used in WP]
There are already scripts within WP core which use the proposed get_option('blog_charset') fix as documented below, so fixing it in a new function wouldn't actually help those if the charset was incorrect.
Scripts found to already use the fix:
- default-widgets.php -> function widget() -> html_entity_decode()
- feed.php -> get_the_category_rss() -> html_entity_decode()
This doesn't remove the need to update the code to use the get_option('blog_charset') within any of the above function calls, but it would seem to me at least that fixing it once would be easier than fixing it lots of times?
Fixing it in get_options() would also fix any errors in HTML headers, albeit that is outside the remit of this ticket.
#13
@
11 years ago
32688.diff seems reasonable to me. If there are others that need to be fixed int he future, we can just add them there.
#15
@
11 years ago
In 1298/tests:
Pass the blog_charset to htmlspecialchars