Opened 15 years ago
Closed 15 years ago
#12284 closed defect (bug) (fixed)
I/O Sanity Failures With Invalid HTML Entity References
Reported by: | miqrogroove | Owned by: | ryan |
---|---|---|---|
Milestone: | 3.0 | Priority: | highest omg bbq |
Severity: | blocker | Version: | |
Component: | Security | Keywords: | has-patch |
Focuses: | Cc: |
Description
Background
While testing moderation and sanitize functions for blog comments in #11833 and related tickets, I discovered this inline comment:
# Change back the allowed entities in our entity whitelist
There is actually no whitelist in the existing kses function. After discussing this on the security mailing list with slow progress, permission was given by IRC to make this public on Trac for speedy attention and resolution.
Vulnerability
Anonymous users can break comment feed validation by injecting invalid character entity references.
Authors can break front page and primary feed validation by injecting invalid character entity references.
These are self-mitigating risks in and of themselves. However...
While trying to patch this bug, I also discovered that the html_esc function in WordPress decodes phrases in the form of &phrase;
That bug may have further security implications, and was resolved by calling the patched kses function from inside the html_esc function.
Attachments (10)
Change History (34)
#1
@
15 years ago
some additional feedback from my side: save an import into the local variable table by using the superglobal return ( ( ! in_array($i, $GLOBALS['allowedentitynames']) ) ? "&$i;" : "&$i;" );
and remove the global command on the first line in wp_kses_named_entities.
What do you think?
#2
@
15 years ago
return ( in_array( $i, $GLOBALS['allowedentitynames'] ) ? "&{$i};" : "&{$i};" );
- spare of the ! operator, increased readability.
#3
@
15 years ago
save an import into the local variable table
Trivial / less readable.
spare of the ! operator
Trivial / wouldn't match style of existing callbacks.
I'll leave it up to the committer.
#4
@
15 years ago
I fixed one if clause (please double check where the global array belongs to, it does not have that many lines any longer, I compacted it which might help to better read / write).
And then some measurements about the improvements, the second run is in my patch:
10 runs, 10.000 iterations per run. miqro-html-injections-for-wordpress.2.patch implementation: Run #0: 0.721942186356 seconds Run #1: 0.715867996216 seconds Run #2: 0.713424921036 seconds Run #3: 0.720436096191 seconds Run #4: 0.718170881271 seconds Run #5: 0.711031198502 seconds Run #6: 0.719617843628 seconds Run #7: 0.718825101852 seconds Run #8: 0.714194059372 seconds Run #9: 0.711913108826 seconds optimised: Run #0: 0.478354930878 seconds Run #1: 0.485837936401 seconds Run #2: 0.479050159454 seconds Run #3: 0.485183000565 seconds Run #4: 0.478661060333 seconds Run #5: 0.478514194489 seconds Run #6: 0.477350234985 seconds Run #7: 0.484051942825 seconds Run #8: 0.479024887085 seconds Run #9: 0.484555006027 seconds
#5
@
15 years ago
10 runs, 10.000 iterations per run.
i18n clarification: "Ten thousand iterations per run."
#7
@
15 years ago
I haven't tested yet, but miqro-html-injections-for-wordpress.2.patch looks good to me.
#10
@
15 years ago
Trunk with patch applied (r13358) does break the following wordpress specialchars/esc_html testcase: Ignore Existing Entities.
2) Test_WP_Specialchars::test_ignores_existing_entities Failed asserting that two strings are equal. --- Expected +++ Actual @@ @@ -& £  & +& £  &
Input data used: '& £  &'
I modified the patch a bit to reflect the changes and that's whats left for the delta. Looks like 
is missed here somehow.
#11
@
15 years ago
hakre,  is not a valid entity reference. If WordPress needs to support that for some (invalid) reason, then we will need to modify the regexp for numeric entity references.
#12
@
15 years ago
Seen in wp_special_chars_decode:
$double_preg = array( '/�*34;/' => '"', '/�*22;/i' => '"' );
hakre's example is probably an invalid test for that substitution.
#13
@
15 years ago
Okay, so aside from  being invalid, if we were to choose a valid one such as " we would find that the kses regexp is stripping zeroes. We could change that if necessary.
#14
@
15 years ago
Added a patch to correct numeric entity logic in kses. Tested min and max values for decimal and hex notation, with and without leading zeroes. Leading zeroes are preserved. Pass/fail for numeric entities matches behavior of validator.w3.org, thanks mostly to function valid_unicode() already in kses.
#15
@
15 years ago
The only oddball we're left with is the " type of replacement in wp_special_chars_decode(), which runs before normalize_entities. /facepalm Worst case scenario, we can hit the decimal version with
str_pad(ltrim($i,'0'), 3, '0', STR_PAD_LEFT);
#18
@
15 years ago
Some notes of today:
HTML 2.0: RFC1866; 13. The HTML Coded Character Set <http://www.ietf.org/rfc/rfc1866.txt> HTML 4.0: HTML 4.01 Specification; 24; Character entity references in HTML 4 <http://www.w3.org/TR/REC-html40/sgml/entities.html> Unused by HTML 2.0: 0-8, 11-12, 14-31, 127-159 Unused by HTML 4.0: 65534-65535 [0xFFFE-0xFFFF] Some Numbers and their length: 65534 (5) [highest valid double-byte codepoint] 1114111 (7) [17 Unicode Planes (in use today)] 2147483648 (10) [ISO 10646 UCS-4 31-bit encoding] 4294967296 (10) [64bit Unsigned]
Some more research is summed up in this report about zero-padding numeric entities.
#19
@
15 years ago
I think we missed something. The comment RSS is getting sanitized by esc_html(). The comment XHTML is not.
@
15 years ago
Patch preserves a change I had here in my testing bed. If I remember correctly that value is in the defition but not in the code. A calling function does actually have it.
#22
@
15 years ago
A note related to the ternary operators that can be found in the callback functions as well:
Ternary operators are fine, but always have them test if the statement is true, not false. Otherwise it just gets confusing.
Found in the coding standards.
This file fully patches kses, and resolves one known issue in html_esc()