#60295 closed defect (bug) (invalid)
esc_html() function returns an empty string when the last character of the input string variable is ASCII 145 or 146
Reported by: | jani20 | Owned by: | |
---|---|---|---|
Milestone: | Priority: | normal | |
Severity: | normal | Version: | |
Component: | Formatting | Keywords: | reporter-feedback |
Focuses: | Cc: |
Description
When the last character of a string variable used as an argument by esc_html() is a left or right single quotation mark (ASCII 145 or 146), the esc_html() function returns an empty string.
Change History (5)
#2
@
12 months ago
@TobiasBg I think your example is using U+2019 _right single quotation mark_. It's hard to see because PHP is probably using UTF-8 by default and your string is the byte sequence "test\xe2\x80\x99"
I'm able to reproduce using this.
<?php '' === esc_html( "test\x91" ); '' === esc_html( "test\x92" );
Now these single quotation marks @jani20 are not actually ASCII, but CP-1252, which is the default character encoding Microsoft used for its products for a long time. I'm guessing that your blog's charset is set to UTF-8, where these bytes form an invalid string.
php > iconv( 'utf-8', 'utf-8', "test\x91" );
PHP Notice: iconv(): Detected an illegal character in input string in php shell code on line 1
Notice: iconv(): Detected an illegal character in input string in php shell code on line 1
Things you might want to check:
- the database character encoding.
- your browser might have a character encoding selection in the Edit menu, or elsewhere. UTF-8 is what it likely should be. I've seen "Default" fail for some sites that don't indicate their charset.
- ensure your theme is generating a META element with the right character encoding, or "charset"
These characters may legitimately appear in HTML; when they do, WordPress should be treating them as CP-1252 treats them. It does this right now if they appear through character references like ’
but not if they come through directly as normal text.
#3
@
12 months ago
Thank you @TobiasBg, this is it. The characters are in post content pasted from some non-WP source (Hubspot probably). The issue was noticed when get_the_excerpt() cut the content so U+0092 was the last character to be shown in the excerpt, after esc_html() the excerpt disappears.
#4
follow-up:
↓ 5
@
12 months ago
- Milestone Awaiting Review deleted
- Resolution set to invalid
- Status changed from new to closed
Glad to hear it, @jani20. I'm going to close this ticket as invalid because I think that's the way to say "this isn't a bug with WordPress directly". It sounds like the culprit is telling WordPress that some text is UTF-8 when it's invalid.
#5
in reply to:
↑ 4
@
12 months ago
Thank you @dmsnell for your thorough explanation. Have a great weekend!
Replying to dmsnell:
Glad to hear it, @jani20. I'm going to close this ticket as invalid because I think that's the way to say "this isn't a bug with WordPress directly". It sounds like the culprit is telling WordPress that some text is UTF-8 when it's invalid.
Hi @jani20,
thanks for your report!
However, I can't seem to reproduce this...
When I add the code
echo esc_html( "test’" );
in an admin page, I correctly get
test’
as the output.Can you maybe provide a more detailed code example that is not working for you?