Make WordPress Core

Opened 18 years ago

Closed 18 years ago

#4409 closed defect (bug) (fixed)

KSES removes text after a non-tag less than sign

Reported by: mdawaffe's profile mdawaffe Owned by: mdawaffe's profile mdawaffe
Milestone: 2.3 Priority: high
Severity: critical Version: 2.2
Component: General Keywords: has-patch commit
Focuses: Cc:

Description

Write a comment or a post with the following content while logged out or logged in as a user without the unfiltered_html cap.

This is a < less than sign.

The output will be the following.

This is a

Attachments (3)

4409.diff (1.1 KB) - added by mdawaffe 18 years ago.
possibility
4409b.diff (2.3 KB) - added by mdawaffe 18 years ago.
A filter
4409c.diff (2.3 KB) - added by mdawaffe 18 years ago.
better than b

Download all attachments as: .zip

Change History (19)

@mdawaffe
18 years ago

possibility

#1 follow-up: @mdawaffe
18 years ago

4409.diff: a possible solution.

  1. Tweaks a kses regex.
  2. Converts
    This is a < less than sign.
    
    to
    This is a &lt; less than sign.
    
  3. Converts
    foo > br
    
    to
    foo <br>
    
    (and similar for any allowed tag). This is KSES' original behavior.

This will need some serious testing to ensure it doesn't open any security holes.

#2 @mdawaffe
18 years ago

I mean

foo < br

#3 @foolswisdom
18 years ago

  • Priority changed from normal to high
  • Severity changed from normal to critical

#4 in reply to: ↑ 1 @westi
18 years ago

Replying to mdawaffe:

4409.diff: a possible solution.

  1. Tweaks a kses regex.
  2. Converts

This will need some serious testing to ensure it doesn't open any security holes.

Is it worth taking an alternative approach to this and adding a new filter to post/comment content before the kses filter which converts lone < and > to &gt; and &lt; so as to not deviate from the stand kses code and preserve the current level of security?

#5 @mdawaffe
18 years ago

Westi, Fine by me. KSES is already breaking the text up in a convenient way for looking for lone less than signs, is all.

#6 @markjaquith
18 years ago

If you can do it outside of KSES without too much fuss or processing overhead, then we should go that route.

Note for posterity: HTML Purifier doesn't handle this any better than KSES, even though it does offer XHTML well-formedness and validity plus XSS filtering all in one package.

#7 @AmbushCommander
18 years ago

Hi, this is the lead developer for HTML Purifier. The upcoming, newest version of HTML Purifier does in fact handle this case gracefully by changing the unescaped < into a literal. For your case, however, with one simple regex:

$html = preg_replace('/<([A-Za-z0-9])/', '&lt;$1', $html);

No mucking around kses necessary. This, however, will turn < br> into &lt; br&gt;

#8 follow-up: @AmbushCommander
18 years ago

Oops, I didn't wrap the code properly. It's really:

$html = preg_replace('/<([^A-Za-z0-9])/', '&lt;$1', $html);

@mdawaffe
18 years ago

A filter

#9 in reply to: ↑ 8 @mdawaffe
18 years ago

Replying to AmbushCommander:

$html = preg_replace('/<([^A-Za-z0-9])/', '&lt;$1', $html);

I don't think that regex is robust enough. "bob<sue" or "<3" would still get caught. Kids say the darndest things.

4409b.diff

  1. Add pre_kses filter to kses (right where it says we should), but rearrange the order slightly (in a way that does not effect kses' efficacy at all).
  2. Add regex to that filter to find and wp_specialchars()ize any lone less than signs.

Kses is not modified in any problematic way. Any strings that might have gotten stripped before but now aren't are run through both wp_specialchars and kses, so I don't believe there are any security issues.

@mdawaffe
18 years ago

better than b

#10 @mdawaffe
18 years ago

4409c.diff

  1. Use strpos() instead. substr_count() was needed for some debugging.

#11 @westi
18 years ago

+1

Not tested but patch looks good to me

#12 @mdawaffe
18 years ago

  • Keywords has-patch commit added
  • Owner changed from anonymous to mdawaffe
  • Status changed from new to assigned

#13 @markjaquith
18 years ago

  • Resolution set to fixed
  • Status changed from assigned to closed

(In [5783]) Entitize lone less-than characters. Props mdawaffe. fixes #4409

#14 @mdawaffe
18 years ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

Oops. The filter was supposed to have extra args.

$string = apply_filters( 'pre_kses', $string, $allowed_html, $allowed_protocols );

Sorry.

#15 @markjaquith
18 years ago

(In [5787]) Pass extra args to pre_kses hook. Props mdawaffe. see #4409

#16 @Nazgul
18 years ago

  • Resolution set to fixed
  • Status changed from reopened to closed
Note: See TracTickets for help on using tickets.