WordPress.org

Make WordPress Core

Opened 5 years ago

Closed 5 years ago

#12284 closed defect (bug) (fixed)

I/O Sanity Failures With Invalid HTML Entity References

Reported by: miqrogroove Owned by: ryan
Milestone: 3.0 Priority: highest omg bbq
Severity: blocker Version:
Component: Security Keywords: has-patch
Focuses: Cc:

Description

Background

While testing moderation and sanitize functions for blog comments in #11833 and related tickets, I discovered this inline comment:

# Change back the allowed entities in our entity whitelist

There is actually no whitelist in the existing kses function. After discussing this on the security mailing list with slow progress, permission was given by IRC to make this public on Trac for speedy attention and resolution.

Vulnerability

Anonymous users can break comment feed validation by injecting invalid character entity references.

Authors can break front page and primary feed validation by injecting invalid character entity references.

These are self-mitigating risks in and of themselves. However...

While trying to patch this bug, I also discovered that the html_esc function in WordPress decodes phrases in the form of &phrase; That bug may have further security implications, and was resolved by calling the patched kses function from inside the html_esc function.

Attachments (10)

miqro-html-injections-for-wordpress.patch (5.1 KB) - added by miqrogroove 5 years ago.
This file fully patches kses, and resolves one known issue in html_esc()
miqro-html-injections-for-wordpress.2.patch (5.5 KB) - added by miqrogroove 5 years ago.
kses style consistency, props hakre
miqro-html-esc-testing.txt (4.7 KB) - added by miqrogroove 5 years ago.
esc_html() Test results observed before and after patching
12284.patch (6.8 KB) - added by hakre 5 years ago.
Fix (IF); Opt (Speed) and Imp (Coding Standards)
12284-extra.patch (1.5 KB) - added by miqrogroove 5 years ago.
Only zero-strip the valid entities, and correct inconsistent logic.
entity-tests.html (1.2 KB) - added by hakre 5 years ago.
A simple Entity Testcase realted to tests we did today
12284-extra.2.2.patch (2.1 KB) - added by miqrogroove 5 years ago.
Adds XHTML output filtering. Adds 3-digit padding for decimal entity references.
12284-extra.2.patch (2.1 KB) - added by miqrogroove 5 years ago.
Adds XHTML output filtering. Adds 3-digit padding for decimal entity references.
12284-extra.3.patch (1.5 KB) - added by miqrogroove 5 years ago.
Adds 3-digit padding for decimal entity references.
12284.2.patch (424 bytes) - added by hakre 5 years ago.
Patch preserves a change I had here in my testing bed. If I remember correctly that value is in the defition but not in the code. A calling function does actually have it.

Download all attachments as: .zip

Change History (34)

@miqrogroove5 years ago

This file fully patches kses, and resolves one known issue in html_esc()

@miqrogroove5 years ago

kses style consistency, props hakre

@miqrogroove5 years ago

esc_html() Test results observed before and after patching

comment:1 @hakre5 years ago

some additional feedback from my side: save an import into the local variable table by using the superglobal return ( ( ! in_array($i, $GLOBALS['allowedentitynames']) ) ? "&$i;" : "&$i;" ); and remove the global command on the first line in wp_kses_named_entities.

What do you think?

comment:2 @hakre5 years ago

return ( in_array( $i, $GLOBALS['allowedentitynames'] ) ? "&{$i};" : "&{$i};" ); - spare of the ! operator, increased readability.

comment:3 @miqrogroove5 years ago

save an import into the local variable table

Trivial / less readable.

spare of the ! operator

Trivial / wouldn't match style of existing callbacks.

I'll leave it up to the committer.

@hakre5 years ago

Fix (IF); Opt (Speed) and Imp (Coding Standards)

comment:4 @hakre5 years ago

I fixed one if clause (please double check where the global array belongs to, it does not have that many lines any longer, I compacted it which might help to better read / write).

And then some measurements about the improvements, the second run is in my patch:

10 runs, 10.000 iterations per run.

miqro-html-injections-for-wordpress.2.patch implementation:

Run #0: 0.721942186356 seconds
Run #1: 0.715867996216 seconds
Run #2: 0.713424921036 seconds
Run #3: 0.720436096191 seconds
Run #4: 0.718170881271 seconds
Run #5: 0.711031198502 seconds
Run #6: 0.719617843628 seconds
Run #7: 0.718825101852 seconds
Run #8: 0.714194059372 seconds
Run #9: 0.711913108826 seconds

optimised:

Run #0: 0.478354930878 seconds
Run #1: 0.485837936401 seconds
Run #2: 0.479050159454 seconds
Run #3: 0.485183000565 seconds
Run #4: 0.478661060333 seconds
Run #5: 0.478514194489 seconds
Run #6: 0.477350234985 seconds
Run #7: 0.484051942825 seconds
Run #8: 0.479024887085 seconds
Run #9: 0.484555006027 seconds

comment:5 @miqrogroove5 years ago

10 runs, 10.000 iterations per run.

i18n clarification: "Ten thousand iterations per run."

comment:6 @hakre5 years ago

true, should have been written 10,000 iterations per run.

comment:7 @ryan5 years ago

I haven't tested yet, but miqro-html-injections-for-wordpress.2.patch looks good to me.

comment:8 @automattor5 years ago

(In [13358]) Whitelist entities. Props miqrogroove. see #12284

comment:9 @nacin5 years ago

(In [13363]) Coil the kses entities whitelist. See #12284

comment:10 @hakre5 years ago

Trunk with patch applied (r13358) does break the following wordpress specialchars/esc_html testcase: Ignore Existing Entities.

2) Test_WP_Specialchars::test_ignores_existing_entities
Failed asserting that two strings are equal.
--- Expected
+++ Actual
@@ @@
-& £  &
+& £  &

Input data used: '& £  &'

I modified the patch a bit to reflect the changes and that's whats left for the delta. Looks like  is missed here somehow.

comment:11 @miqrogroove5 years ago

hakre, &#22 is not a valid entity reference. If WordPress needs to support that for some (invalid) reason, then we will need to modify the regexp for numeric entity references.

comment:12 @miqrogroove5 years ago

Seen in wp_special_chars_decode:

$double_preg = array( '/&#0*34;/'  => '"', '/&#x0*22;/i' => '"' );

hakre's example is probably an invalid test for that substitution.

comment:13 @miqrogroove5 years ago

Okay, so aside from  being invalid, if we were to choose a valid one such as " we would find that the kses regexp is stripping zeroes. We could change that if necessary.

comment:14 @miqrogroove5 years ago

Added a patch to correct numeric entity logic in kses. Tested min and max values for decimal and hex notation, with and without leading zeroes. Leading zeroes are preserved. Pass/fail for numeric entities matches behavior of validator.w3.org, thanks mostly to function valid_unicode() already in kses.

@miqrogroove5 years ago

Only zero-strip the valid entities, and correct inconsistent logic.

comment:15 @miqrogroove5 years ago

The only oddball we're left with is the " type of replacement in wp_special_chars_decode(), which runs before normalize_entities. /facepalm Worst case scenario, we can hit the decimal version with

str_pad(ltrim($i,'0'), 3, '0', STR_PAD_LEFT);

comment:16 @miqrogroove5 years ago

" is invalid in XHTML due to case sensitivity. Is that part a lost cause?

comment:17 @miqrogroove5 years ago

Previous point was moot. kses already normalizes 'X' to 'x'.

@hakre5 years ago

A simple Entity Testcase realted to tests we did today

comment:18 @hakre5 years ago

Some notes of today:

HTML 2.0: RFC1866; 13. The HTML Coded Character Set <http://www.ietf.org/rfc/rfc1866.txt>
HTML 4.0: HTML 4.01 Specification; 24; Character entity references in HTML 4 <http://www.w3.org/TR/REC-html40/sgml/entities.html>

Unused by HTML 2.0:
0-8, 11-12, 14-31, 127-159

Unused by HTML 4.0:
65534-65535 [0xFFFE-0xFFFF]

Some Numbers and their length:
65534 (5) [highest valid double-byte codepoint]
1114111 (7) [17 Unicode Planes (in use today)] 
2147483648 (10) [ISO 10646 UCS-4 31-bit encoding]
4294967296 (10) [64bit Unsigned]

Some more research is summed up in this report about zero-padding numeric entities.

comment:19 @miqrogroove5 years ago

I think we missed something. The comment RSS is getting sanitized by esc_html(). The comment XHTML is not.

@miqrogroove5 years ago

Adds XHTML output filtering. Adds 3-digit padding for decimal entity references.

@miqrogroove5 years ago

Adds XHTML output filtering. Adds 3-digit padding for decimal entity references.

comment:20 @miqrogroove5 years ago

That's better :)

comment:21 @miqrogroove5 years ago

Okay disregard the XHTML part. I forgot to logout while testing.

@miqrogroove5 years ago

Adds 3-digit padding for decimal entity references.

@hakre5 years ago

Patch preserves a change I had here in my testing bed. If I remember correctly that value is in the defition but not in the code. A calling function does actually have it.

comment:22 @hakre5 years ago

A note related to the ternary operators that can be found in the callback functions as well:

Ternary operators are fine, but always have them test if the statement is true, not false. Otherwise it just gets confusing.

Found in the coding standards.

comment:23 @ryan5 years ago

(In [13648]) Fix numeric entity logic in kses. Props miqrogroove. see #12284

comment:24 @miqrogroove5 years ago

  • Resolution set to fixed
  • Status changed from new to closed

Fixed by [13358] and [13648]

Note: See TracTickets for help on using tickets.