#31190 closed defect (bug) (duplicate)
esc_html() ate my ampersand
Reported by: | mdgl | Owned by: | |
---|---|---|---|
Milestone: | Priority: | normal | |
Severity: | normal | Version: | 4.1 |
Component: | Formatting | Keywords: | needs-patch |
Focuses: | Cc: |
Description
While testing #28816 I noticed that esc_html()
effectively "eats" an explicit XML/HTML ampersand entity if this is immediately followed by what looks like another valid XML/HTML entity. For example:
Input | Actual Output | Expected Output | Notes |
---|---|---|---|
A & B | A & B | A & B | Lone ampersand "corrected" |
A & B | A & B | A & B | Valid HTML passed through |
A – B | A – B | A – B | Valid HTML passed through |
A – B | A – B | A – B | Wrong as ampersand missing |
A &ndash B | A &ndash B | A &ndash B | Malformed entity handled correctly |
This happens because of the call to wp_specialchars_decode()
within _wp_specialchars()
. The logic of this is very hard to fathom. If you remove this call, the escaping appears to work correctly with the exception that some numeric character references are not replaced by their named equivalents which breaks one of the unit tests, even though this could be regarded as dubious behaviour.
Attachments (2)
Change History (10)
#1
@
10 years ago
- Milestone Awaiting Review deleted
- Resolution set to invalid
- Status changed from new to closed
#2
@
10 years ago
- Resolution invalid deleted
- Status changed from closed to reopened
– on input would not be equal to – on output simply because the ampersand in
this instance would (and should) be interpreted as a literal string &. It's basically
double-encoded, and esc_html() is not really meant to decode special characters in this way.
Indeed, you are just confirming the bug. At the moment, esc_html()
converts –
to just –
as my table shows. Like you, I believe it should remain as –
.
#6
@
9 years ago
- Milestone Awaiting Review deleted
- Resolution set to duplicate
- Status changed from reopened to closed
Duplicate of #17780.
I would consider this expected behavior for
esc_html()
.–
on input would not be equal to–
on output simply because the ampersand in this instance would (and should) be interpreted as a literal string&
. It's basically double-encoded, andesc_html()
is not really meant to decode special characters in this way.