Make WordPress Core

Opened 7 years ago

Closed 7 years ago

#4409 closed defect (bug) (fixed)

KSES removes text after a non-tag less than sign

Reported by: mdawaffe Owned by: mdawaffe
Milestone: 2.3 Priority: high
Severity: critical Version: 2.2
Component: General Keywords: has-patch commit
Focuses: Cc:


Write a comment or a post with the following content while logged out or logged in as a user without the unfiltered_html cap.

This is a < less than sign.

The output will be the following.

This is a

Attachments (3)

4409.diff (1.1 KB) - added by mdawaffe 7 years ago.
4409b.diff (2.3 KB) - added by mdawaffe 7 years ago.
A filter
4409c.diff (2.3 KB) - added by mdawaffe 7 years ago.
better than b

Download all attachments as: .zip

Change History (19)

mdawaffe7 years ago


comment:1 follow-up: mdawaffe7 years ago

4409.diff: a possible solution.

  1. Tweaks a kses regex.
  2. Converts
    This is a < less than sign.
    This is a &lt; less than sign.
  3. Converts
    foo > br
    foo <br>
    (and similar for any allowed tag). This is KSES' original behavior.

This will need some serious testing to ensure it doesn't open any security holes.

comment:2 mdawaffe7 years ago

I mean

foo < br

comment:3 foolswisdom7 years ago

  • Priority changed from normal to high
  • Severity changed from normal to critical

comment:4 in reply to: ↑ 1 westi7 years ago

Replying to mdawaffe:

4409.diff: a possible solution.

  1. Tweaks a kses regex.
  2. Converts

This will need some serious testing to ensure it doesn't open any security holes.

Is it worth taking an alternative approach to this and adding a new filter to post/comment content before the kses filter which converts lone < and > to &gt; and &lt; so as to not deviate from the stand kses code and preserve the current level of security?

comment:5 mdawaffe7 years ago

Westi, Fine by me. KSES is already breaking the text up in a convenient way for looking for lone less than signs, is all.

comment:6 markjaquith7 years ago

If you can do it outside of KSES without too much fuss or processing overhead, then we should go that route.

Note for posterity: HTML Purifier doesn't handle this any better than KSES, even though it does offer XHTML well-formedness and validity plus XSS filtering all in one package.

comment:7 AmbushCommander7 years ago

Hi, this is the lead developer for HTML Purifier. The upcoming, newest version of HTML Purifier does in fact handle this case gracefully by changing the unescaped < into a literal. For your case, however, with one simple regex:

$html = preg_replace('/<([A-Za-z0-9])/', '&lt;$1', $html);

No mucking around kses necessary. This, however, will turn < br> into &lt; br&gt;

comment:8 follow-up: AmbushCommander7 years ago

Oops, I didn't wrap the code properly. It's really:

$html = preg_replace('/<([^A-Za-z0-9])/', '&lt;$1', $html);

mdawaffe7 years ago

A filter

comment:9 in reply to: ↑ 8 mdawaffe7 years ago

Replying to AmbushCommander:

$html = preg_replace('/<([^A-Za-z0-9])/', '&lt;$1', $html);

I don't think that regex is robust enough. "bob<sue" or "<3" would still get caught. Kids say the darndest things.


  1. Add pre_kses filter to kses (right where it says we should), but rearrange the order slightly (in a way that does not effect kses' efficacy at all).
  2. Add regex to that filter to find and wp_specialchars()ize any lone less than signs.

Kses is not modified in any problematic way. Any strings that might have gotten stripped before but now aren't are run through both wp_specialchars and kses, so I don't believe there are any security issues.

mdawaffe7 years ago

better than b

comment:10 mdawaffe7 years ago


  1. Use strpos() instead. substr_count() was needed for some debugging.

comment:11 westi7 years ago


Not tested but patch looks good to me

comment:12 mdawaffe7 years ago

  • Keywords has-patch commit added
  • Owner changed from anonymous to mdawaffe
  • Status changed from new to assigned

comment:13 markjaquith7 years ago

  • Resolution set to fixed
  • Status changed from assigned to closed

(In [5783]) Entitize lone less-than characters. Props mdawaffe. fixes #4409

comment:14 mdawaffe7 years ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

Oops. The filter was supposed to have extra args.

$string = apply_filters( 'pre_kses', $string, $allowed_html, $allowed_protocols );


comment:15 markjaquith7 years ago

(In [5787]) Pass extra args to pre_kses hook. Props mdawaffe. see #4409

comment:16 Nazgul7 years ago

  • Resolution set to fixed
  • Status changed from reopened to closed
Note: See TracTickets for help on using tickets.