WordPress.org

Make WordPress Core

Opened 9 years ago

Closed 4 years ago

Last modified 4 years ago

#17780 closed defect (bug) (fixed)

Use PHP native double encoding prevention in htmlspecialchars()

Reported by: nbachiyski Owned by: wonderboymusic
Milestone: 4.3 Priority: high
Severity: major Version:
Component: Formatting Keywords: has-patch
Focuses: Cc:
PR Number:

Description

Since PHP 5.2.3 the htmlspecialchars() function has an optional $double_encode parameter, which we could use. This can save us a few expensive kses/html decoding calls.

We need to make sure it works the same way as our implementation.

Attachments (9)

specialchars-tests.diff (2.3 KB) - added by miqrogroove 5 years ago.
miqro-17780.patch (3.4 KB) - added by miqrogroove 5 years ago.
miqro-17780.2.patch (3.7 KB) - added by miqrogroove 5 years ago.
Adds a test exception for apos (')
specialchars-tests.2.diff (2.5 KB) - added by miqrogroove 5 years ago.
Test coverage for unexpected decoding of entities.
miqro-17780.3.patch (3.9 KB) - added by miqrogroove 5 years ago.
Added test results in patched version.
miqro-17780.4.patch (5.5 KB) - added by miqrogroove 4 years ago.
Fixes tests affecting other functions.
miqro-17780-compat.patch (1.8 KB) - added by miqrogroove 4 years ago.
Compatibility for PHP < 5.4.
miqro-17780-compat.2.patch (1.9 KB) - added by miqrogroove 4 years ago.
Expanded comments.
miqro-17780.5.patch (6.4 KB) - added by miqrogroove 4 years ago.
Fixes post editor bug

Download all attachments as: .zip

Change History (55)

#1 @chriscct7
5 years ago

  • Keywords dev-feedback added

#2 @DrewAPicture
5 years ago

  • Component changed from General to Formatting

#3 @miqrogroove
5 years ago

  • Keywords needs-unit-tests added
  • Priority changed from low to normal
  • Severity changed from minor to normal
  • Version set to 3.2.1

This may be more than a 'minor' enhancement if it works with minimal fuss.

#4 @miqrogroove
5 years ago

Added some basic unit tests. Take note of some quirks in the existing non-double-encoding mode: KSES normalizes the number of digits in numeric references and is currently not HTML5 compatible.

#5 @miqrogroove
5 years ago

  • Keywords needs-patch dev-feedback removed
  • Milestone changed from Future Release to 4.3

After patching, there were only two differences in test results:

  • Number of digits is not normalized in numeric references.
  • &apos; fails the double-encoding test in PHP for some reason.

IMO, neither of these is a deal breaker and we should consider making the change.

@miqrogroove
5 years ago

Adds a test exception for apos (&apos;)

#6 @miqrogroove
5 years ago

  • Keywords has-patch added; needs-unit-tests removed
  • Owner set to miqrogroove
  • Status changed from new to accepted

#7 @miqrogroove
5 years ago

Should add more tests to determine behaviors when function input is already double-encoded.

#8 @miqrogroove
5 years ago

See also #8767, #12284, and #32556. I'm not sure there was ever a valid reason for calling specialchars_decode but it is worth reviewing.

@miqrogroove
5 years ago

Test coverage for unexpected decoding of entities.

@miqrogroove
5 years ago

Added test results in patched version.

#9 @miqrogroove
5 years ago

After patching, there are three differences in test results:

  • Number of digits is not normalized in numeric references.
  • &apos; fails the double-encoding test in PHP for some reason.
  • Entities that were double-encoded at input will not get decoded for output.

IMO, these all remain acceptable differences. The original call to specialchars_decode appears to be a mistake in [10297] and I am looking for any additional information about why that was committed in the first place.

#10 follow-up: @miqrogroove
5 years ago

I see where the decoding bug was introduced now. Here is an explanation:

  • In [10297] the strategy to prevent double encoding was to decode specialchars before the call to encode specialchars. This was a harmless, but ultimately futile algorithm because it wouldn't do anything.
  • In [10298] a placeholder strategy was added to accomplish actual avoidance of double encoding. It appears the author failed to remove the decode command from the patch, resulting in unnecessary decoding prior to the placeholder insertion.
  • In #12284 although the bug was mentioned there, I didn't dig this far to find out what was the original problem.

I'm open to other opinions, but it looks like the reference decoding by this function is entirely unintentional.

#11 in reply to: ↑ 10 ; follow-ups: @azaozz
5 years ago

Replying to miqrogroove:

Unfortunately I don't remember anything about these patches, it's been more than 6 years :)

Seems like at the time (and maybe still) _wp_specialchars() is getting double encoded strings and is expected to return single encoded. It is "undocumented behaviour" but not sure if we can remove it.

#12 @mdgl
5 years ago

Related #31190?

#13 @miqrogroove
5 years ago

#31190 was marked as a duplicate.

#14 in reply to: ↑ 11 @miqrogroove
5 years ago

Replying to azaozz:

Seems like at the time (and maybe still) _wp_specialchars() is getting double encoded strings and is expected to return single encoded.

That would be a significant bug in itself and I am very skeptical of it.

This ticket was mentioned in Slack in #core by miqrogroove. View the logs.


4 years ago

#16 @obenland
4 years ago

@wonderboymusic, could you review and commit/punt when you get a chance?

@miqrogroove
4 years ago

Fixes tests affecting other functions.

#17 @wonderboymusic
4 years ago

  • Resolution set to fixed
  • Status changed from accepted to closed

In 32850:

Since PHP 5.2.3, the htmlspecialchars() function has an optional $double_encode parameter, which we can now use. This will save us a few expensive kses/html decoding calls.

Adds unit tests.

Props miqrogroove.
Fixes #17780.

#18 follow-up: @miqrogroove
4 years ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

Test results from @netweb showed that prior to PHP 5.4, the $double_encode parameter does not perform entity validation.

#19 in reply to: ↑ 18 @netweb
4 years ago

Replying to miqrogroove:

Test results from @netweb showed that prior to PHP 5.4, the $double_encode parameter does not perform entity validation.

Here's the Travis-CI build: https://travis-ci.org/aaronjorbin/develop.wordpress/builds/67442185

#20 follow-up: @johnbillion
4 years ago

  • Keywords needs-testing added; has-patch removed
  • Version 3.2.1 deleted

What's the deal with &apos;?

@miqrogroove
4 years ago

Compatibility for PHP < 5.4.

#22 @netweb
4 years ago

  • Keywords has-patch commit added; needs-testing removed

Test results of miqro-17780-compat.patch​:

All tests passed :)

@miqrogroove
4 years ago

Expanded comments.

#24 @miqrogroove
4 years ago

Just a note about the &apos; encoding. I tested ENT_XHTML flag which is not well documented, and found that it gives different output when ENT_QUOTES is also set. It converts each ' into &apos; which is not necessarily more desirable than converting &apos; into &amp;apos;. But it is something we can use later if needed.

#25 @azaozz
4 years ago

  • Resolution set to fixed
  • Status changed from reopened to closed

In 32851:

Fix using htmlspecialchars() whit the $double_encode parameter. PHP < 5.4 doesn't validate the entities.
Props miqrogroove. Fixes #17780.

This ticket was mentioned in Slack in #core-editor by azaozz. View the logs.


4 years ago

#27 in reply to: ↑ 11 ; follow-ups: @azaozz
4 years ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

Seems like at the time (and maybe still) _wp_specialchars() is getting double encoded strings and is expected to return single encoded. It is "undocumented behaviour" but not sure if we can remove it.

This broke post titles when there is a & (ampersand) in them. To reproduce:

  • Create new post with an ampersand in the title.
  • Save draft several times.

#28 in reply to: ↑ 27 ; follow-up: @chriscct7
4 years ago

  • Keywords needs-patch needs-unit-tests added; has-patch commit removed
  • Type changed from enhancement to defect (bug)

Replying to azaozz:

Seems like at the time (and maybe still) _wp_specialchars() is getting double encoded strings and is expected to return single encoded. It is "undocumented behaviour" but not sure if we can remove it.

This broke post titles when there is a & (ampersand) in them. To reproduce:

  • Create new post with an ampersand in the title.
  • Save draft several times.

This would seem to indicate we're missing a unit test for creating a post with many special characters in the title (one for & in particular), and then pulling it out and seeing if it matches.

This ticket was mentioned in Slack in #core by ocean90. View the logs.


4 years ago

#30 in reply to: ↑ 27 @JanR
4 years ago

Replying to azaozz:

Seems like at the time (and maybe still) _wp_specialchars() is getting double encoded strings and is expected to return single encoded. It is "undocumented behaviour" but not sure if we can remove it.

This broke post titles when there is a & (ampersand) in them. To reproduce:

  • Create new post with an ampersand in the title.
  • Save draft several times.

For what it's worth: WordPress 4.2.2 doesn't double encode the subject, but 4.3 beta does.
It not only affects the ampersand &, it affects ", <, > too.

#31 @ocean90
4 years ago

  • Priority changed from normal to high
  • Severity changed from normal to major

#32 @miqrogroove
4 years ago

Guys, please post your test results. I am on the road so getting vague messages about ampersands is frustrating.

I did a quick test of my own using an admin account and found at least two bugs:

  1. Post permalinks are not HTML escaped outside of the tag, at least in my theme. Attributes and link elements do not seem affected.
  2. The post editor title box is displaying a double-encoded post title.

So after typing a test post title, I found places where my input was returned not encoded, once encoded, and also twice encoded. Further testing is needed.

This ticket was mentioned in Slack in #docs by sam. View the logs.


4 years ago

This ticket was mentioned in Slack in #core by sam. View the logs.


4 years ago

#35 @wonderboymusic
4 years ago

In 33148:

Revert [32851] and [32850] for now, tl;dr encoding issues.

See #17780.

@miqrogroove
4 years ago

Fixes post editor bug

#36 @miqrogroove
4 years ago

  • Keywords needs-patch removed

New patch includes bugfix for editing post titles.

#37 @miqrogroove
4 years ago

  • Keywords has-patch added

Tested using post title input Title encoding & test < blah > my &copy; your &amp; Checked admin and non-admin accounts. Found patched trunk behavior identical to 4.2.2.

#38 @samuelsidler
4 years ago

I think we should punt this from 4.3 and get it in early in the 4.4 cycle. No 4.3 features / tickets are dependent on this ticket and we clearly didn't have enough test coverage when it was committed the first time. We should add more comprehensive tests and commit it at the very start of the 4.4 cycle to ensure there are no other, similar issues to comment:27.

#39 @miqrogroove
4 years ago

I'd like to see the patch in beta 3. If you really think this is going to cause some damage then the right thing to do here may be to deprecate esc_attr() and related functions and replace them all with corrected functions. And that wouldn't be ready for one or two versions for sure.

Last edited 4 years ago by miqrogroove (previous) (diff)

#40 @azaozz
4 years ago

Looking back at the changes to the post_title field: seems this is a long existing bug that was masked by the "unusual" behaviour of _wp_specialchars(). The post_title is escaped first with sanitize_post_field() then with htmlspecialchars() and finally with esc_attr().

As far as I see this is the only place where esc_attr() and htmlspecialchars() are nested. Not sure if fixing this at the beginning of a cycle will be any different than fixing it now. In both cases plugins that have copied that particular code from core (and don't follow WordPress development) will break. Chances are that most plugin authors will test their plugins in RC or perhaps when they receive the "What's new in 4.3" email :)

#41 in reply to: ↑ 28 @miqrogroove
4 years ago

Replying to chriscct7:

This would seem to indicate we're missing a unit test for creating a post with many special characters in the title (one for & in particular), and then pulling it out and seeing if it matches.

I would like to humbly suggest that this is beyond the scope of this ticket. New ticket #32990.

#42 @miqrogroove
4 years ago

  • Keywords needs-unit-tests removed

#43 @wonderboymusic
4 years ago

  • Owner changed from miqrogroove to wonderboymusic
  • Status changed from reopened to assigned

Gonna review this

#44 @wonderboymusic
4 years ago

  • Resolution set to fixed
  • Status changed from assigned to closed

In 33271:

After [33148]:
Don't nest esc_attr() and htmlspecialchars() when escaping the post title on the edit post screen.

Unrevert parts of [32851] and [32850].

Adds/alters unit tests.

Props miqrogroove.
Fixes #17780.

This ticket was mentioned in Slack in #core by dd32. View the logs.


4 years ago

This ticket was mentioned in Slack in #core-editor by sam. View the logs.


4 years ago

Note: See TracTickets for help on using tickets.