WordPress.org

Make WordPress Core

Opened 3 years ago

Closed 3 years ago

Last modified 3 years ago

#31190 closed defect (bug) (duplicate)

esc_html() ate my ampersand

Reported by: mdgl Owned by:
Milestone: Priority: normal
Severity: normal Version: 4.1
Component: Formatting Keywords: needs-patch
Focuses: Cc:

Description

While testing #28816 I noticed that esc_html() effectively "eats" an explicit XML/HTML ampersand entity if this is immediately followed by what looks like another valid XML/HTML entity. For example:

Input Actual Output Expected Output Notes
A & B A & B A & B Lone ampersand "corrected"
A & B A & B A & B Valid HTML passed through
A – B A – B A – B Valid HTML passed through
A – B A – B A – B Wrong as ampersand missing
A &ndash B A &ndash B A &ndash B Malformed entity handled correctly

This happens because of the call to wp_specialchars_decode() within _wp_specialchars(). The logic of this is very hard to fathom. If you remove this call, the escaping appears to work correctly with the exception that some numeric character references are not replaced by their named equivalents which breaks one of the unit tests, even though this could be regarded as dubious behaviour.

Attachments (2)

31190.tests.diff (1.1 KB) - added by MikeHansenMe 3 years ago.
Unit tests showing problem
31190.diff (882 bytes) - added by collinsinternet 3 years ago.
Adds & to wp_specialchars_decode() to allow unit tests to pass.

Download all attachments as: .zip

Change History (10)

#1 @DrewAPicture
3 years ago

  • Milestone Awaiting Review deleted
  • Resolution set to invalid
  • Status changed from new to closed

I would consider this expected behavior for esc_html().

– on input would not be equal to – on output simply because the ampersand in this instance would (and should) be interpreted as a literal string &. It's basically double-encoded, and esc_html() is not really meant to decode special characters in this way.

#2 @mdgl
3 years ago

  • Resolution invalid deleted
  • Status changed from closed to reopened

– on input would not be equal to – on output simply because the ampersand in this instance would (and should) be interpreted as a literal string &. It's basically double-encoded, and esc_html() is not really meant to decode special characters in this way.

Indeed, you are just confirming the bug. At the moment, esc_html() converts – to just – as my table shows. Like you, I believe it should remain as –.

#3 @SergeyBiryukov
3 years ago

  • Milestone set to Awaiting Review

#4 @DrewAPicture
3 years ago

Sounds like both @boonebgorges and I read that wrong. Sorry about that.

@MikeHansenMe
3 years ago

Unit tests showing problem

#5 @MikeHansenMe
3 years ago

  • Keywords needs-patch added

Added some unit tests to show the problem.

@collinsinternet
3 years ago

Adds & to wp_specialchars_decode() to allow unit tests to pass.

#6 @miqrogroove
3 years ago

  • Milestone Awaiting Review deleted
  • Resolution set to duplicate
  • Status changed from reopened to closed

Duplicate of #17780.

#7 @mdgl
3 years ago

Surely this "bug" can't be a duplicate of a separate "enhancement" request? I think #17780 would need its category and description updating.

#8 @miqrogroove
3 years ago

Hi @mdgl, this has been resolved for 4.3 per the older ticket. We normally get multiple reports that only require one patch, so we only keep one ticket as needed. This is not to downplay one or the other, but to simply declutter the thousands of tickets.

Note: See TracTickets for help on using tickets.