#17780 closed defect (bug) (fixed)
Use PHP native double encoding prevention in htmlspecialchars()
Reported by: | nbachiyski | Owned by: | wonderboymusic |
---|---|---|---|
Milestone: | 4.3 | Priority: | high |
Severity: | major | Version: | |
Component: | Formatting | Keywords: | has-patch |
Focuses: | Cc: |
Description
Since PHP 5.2.3 the htmlspecialchars()
function has an optional $double_encode
parameter, which we could use. This can save us a few expensive kses/html decoding calls.
We need to make sure it works the same way as our implementation.
Attachments (9)
Change History (55)
#3
@
9 years ago
- Keywords needs-unit-tests added
- Priority changed from low to normal
- Severity changed from minor to normal
- Version set to 3.2.1
#4
@
9 years ago
Added some basic unit tests. Take note of some quirks in the existing non-double-encoding mode: KSES normalizes the number of digits in numeric references and is currently not HTML5 compatible.
#5
@
9 years ago
- Keywords needs-patch dev-feedback removed
- Milestone changed from Future Release to 4.3
After patching, there were only two differences in test results:
- Number of digits is not normalized in numeric references.
'
fails the double-encoding test in PHP for some reason.
IMO, neither of these is a deal breaker and we should consider making the change.
#6
@
9 years ago
- Keywords has-patch added; needs-unit-tests removed
- Owner set to miqrogroove
- Status changed from new to accepted
#7
@
9 years ago
Should add more tests to determine behaviors when function input is already double-encoded.
#9
@
9 years ago
After patching, there are three differences in test results:
- Number of digits is not normalized in numeric references.
'
fails the double-encoding test in PHP for some reason.- Entities that were double-encoded at input will not get decoded for output.
IMO, these all remain acceptable differences. The original call to specialchars_decode appears to be a mistake in [10297] and I am looking for any additional information about why that was committed in the first place.
#10
follow-up:
↓ 11
@
9 years ago
I see where the decoding bug was introduced now. Here is an explanation:
- In [10297] the strategy to prevent double encoding was to decode specialchars before the call to encode specialchars. This was a harmless, but ultimately futile algorithm because it wouldn't do anything.
- In [10298] a placeholder strategy was added to accomplish actual avoidance of double encoding. It appears the author failed to remove the decode command from the patch, resulting in unnecessary decoding prior to the placeholder insertion.
- In #12284 although the bug was mentioned there, I didn't dig this far to find out what was the original problem.
I'm open to other opinions, but it looks like the reference decoding by this function is entirely unintentional.
#11
in reply to:
↑ 10
;
follow-ups:
↓ 14
↓ 27
@
9 years ago
Replying to miqrogroove:
Unfortunately I don't remember anything about these patches, it's been more than 6 years :)
Seems like at the time (and maybe still) _wp_specialchars()
is getting double encoded strings and is expected to return single encoded. It is "undocumented behaviour" but not sure if we can remove it.
#14
in reply to:
↑ 11
@
9 years ago
Replying to azaozz:
Seems like at the time (and maybe still)
_wp_specialchars()
is getting double encoded strings and is expected to return single encoded.
That would be a significant bug in itself and I am very skeptical of it.
This ticket was mentioned in Slack in #core by miqrogroove. View the logs.
9 years ago
#18
follow-up:
↓ 19
@
9 years ago
- Resolution fixed deleted
- Status changed from closed to reopened
Test results from @netweb showed that prior to PHP 5.4, the $double_encode parameter does not perform entity validation.
#19
in reply to:
↑ 18
@
9 years ago
Replying to miqrogroove:
Test results from @netweb showed that prior to PHP 5.4, the $double_encode parameter does not perform entity validation.
Here's the Travis-CI build: https://travis-ci.org/aaronjorbin/develop.wordpress/builds/67442185
#20
follow-up:
↓ 21
@
9 years ago
- Keywords needs-testing added; has-patch removed
- Version 3.2.1 deleted
What's the deal with '
?
#21
in reply to:
↑ 20
@
9 years ago
Replying to johnbillion:
Officially it is defined in XML and XHTML, but not in HTML: https://en.wikipedia.org/?title=List_of_XML_and_HTML_character_entity_references#Entities_representing_special_characters_in_XHTML
#22
@
9 years ago
- Keywords has-patch commit added; needs-testing removed
Test results of miqro-17780-compat.patch:
- Travis-CI Build -> https://travis-ci.org/ntwb/wordpress/builds/67456349
- PHP 5.2 Job -> https://travis-ci.org/ntwb/wordpress/jobs/67456351
- PHP 5.3 Job -> https://travis-ci.org/ntwb/wordpress/jobs/67456352
All tests passed :)
#23
@
9 years ago
Tests pass via latest patch -> https://travis-ci.org/ntwb/wordpress/builds/67458355
#24
@
9 years ago
Just a note about the '
encoding. I tested ENT_XHTML
flag which is not well documented, and found that it gives different output when ENT_QUOTES
is also set. It converts each '
into '
which is not necessarily more desirable than converting '
into '
. But it is something we can use later if needed.
This ticket was mentioned in Slack in #core-editor by azaozz. View the logs.
9 years ago
#27
in reply to:
↑ 11
;
follow-ups:
↓ 28
↓ 30
@
9 years ago
- Resolution fixed deleted
- Status changed from closed to reopened
Seems like at the time (and maybe still)
_wp_specialchars()
is getting double encoded strings and is expected to return single encoded. It is "undocumented behaviour" but not sure if we can remove it.
This broke post titles when there is a &
(ampersand) in them. To reproduce:
- Create new post with an ampersand in the title.
- Save draft several times.
#28
in reply to:
↑ 27
;
follow-up:
↓ 41
@
9 years ago
- Keywords needs-patch needs-unit-tests added; has-patch commit removed
- Type changed from enhancement to defect (bug)
Replying to azaozz:
Seems like at the time (and maybe still)
_wp_specialchars()
is getting double encoded strings and is expected to return single encoded. It is "undocumented behaviour" but not sure if we can remove it.
This broke post titles when there is a
&
(ampersand) in them. To reproduce:
- Create new post with an ampersand in the title.
- Save draft several times.
This would seem to indicate we're missing a unit test for creating a post with many special characters in the title (one for & in particular), and then pulling it out and seeing if it matches.
This ticket was mentioned in Slack in #core by ocean90. View the logs.
9 years ago
#30
in reply to:
↑ 27
@
9 years ago
Replying to azaozz:
Seems like at the time (and maybe still)
_wp_specialchars()
is getting double encoded strings and is expected to return single encoded. It is "undocumented behaviour" but not sure if we can remove it.
This broke post titles when there is a
&
(ampersand) in them. To reproduce:
- Create new post with an ampersand in the title.
- Save draft several times.
For what it's worth: WordPress 4.2.2 doesn't double encode the subject, but 4.3 beta does.
It not only affects the ampersand &
, it affects "
, <
, >
too.
#32
@
9 years ago
Guys, please post your test results. I am on the road so getting vague messages about ampersands is frustrating.
I did a quick test of my own using an admin account and found at least two bugs:
- Post permalinks are not HTML escaped outside of the tag, at least in my theme. Attributes and link elements do not seem affected.
- The post editor title box is displaying a double-encoded post title.
So after typing a test post title, I found places where my input was returned not encoded, once encoded, and also twice encoded. Further testing is needed.
This ticket was mentioned in Slack in #docs by sam. View the logs.
9 years ago
This ticket was mentioned in Slack in #core by sam. View the logs.
9 years ago
#37
@
9 years ago
- Keywords has-patch added
Tested using post title input Title encoding & test < blah > my © your &
Checked admin and non-admin accounts. Found patched trunk behavior identical to 4.2.2.
#38
@
9 years ago
I think we should punt this from 4.3 and get it in early in the 4.4 cycle. No 4.3 features / tickets are dependent on this ticket and we clearly didn't have enough test coverage when it was committed the first time. We should add more comprehensive tests and commit it at the very start of the 4.4 cycle to ensure there are no other, similar issues to comment:27.
#39
@
9 years ago
I'd like to see the patch in beta 3. If you really think this is going to cause some damage then the right thing to do here may be to deprecate esc_attr() and related functions and replace them all with corrected functions. And that wouldn't be ready for one or two versions for sure.
#40
@
9 years ago
Looking back at the changes to the post_title field: seems this is a long existing bug that was masked by the "unusual" behaviour of _wp_specialchars()
. The post_title is escaped first with sanitize_post_field()
then with htmlspecialchars()
and finally with esc_attr()
.
As far as I see this is the only place where esc_attr()
and htmlspecialchars()
are nested. Not sure if fixing this at the beginning of a cycle will be any different than fixing it now. In both cases plugins that have copied that particular code from core (and don't follow WordPress development) will break. Chances are that most plugin authors will test their plugins in RC or perhaps when they receive the "What's new in 4.3" email :)
#41
in reply to:
↑ 28
@
9 years ago
Replying to chriscct7:
This would seem to indicate we're missing a unit test for creating a post with many special characters in the title (one for & in particular), and then pulling it out and seeing if it matches.
I would like to humbly suggest that this is beyond the scope of this ticket. New ticket #32990.
#43
@
9 years ago
- Owner changed from miqrogroove to wonderboymusic
- Status changed from reopened to assigned
Gonna review this
This may be more than a 'minor' enhancement if it works with minimal fuss.