Opened 18 years ago
Closed 18 years ago
#4409 closed defect (bug) (fixed)
KSES removes text after a non-tag less than sign
Reported by: |
|
Owned by: |
|
---|---|---|---|
Milestone: | 2.3 | Priority: | high |
Severity: | critical | Version: | 2.2 |
Component: | General | Keywords: | has-patch commit |
Focuses: | Cc: |
Description
Write a comment or a post with the following content while logged out or logged in as a user without the unfiltered_html
cap.
This is a < less than sign.
The output will be the following.
This is a
Attachments (3)
Change History (19)
#1
follow-up:
↓ 4
@
18 years ago
4409.diff: a possible solution.
- Tweaks a kses regex.
- Converts
This is a < less than sign.
toThis is a < less than sign.
- Converts
foo > br
tofoo <br>
(and similar for any allowed tag). This is KSES' original behavior.
This will need some serious testing to ensure it doesn't open any security holes.
#4
in reply to:
↑ 1
@
18 years ago
Replying to mdawaffe:
4409.diff: a possible solution.
- Tweaks a kses regex.
- Converts
This will need some serious testing to ensure it doesn't open any security holes.
Is it worth taking an alternative approach to this and adding a new filter to post/comment content before the kses filter which converts lone < and > to > and < so as to not deviate from the stand kses code and preserve the current level of security?
#5
@
18 years ago
Westi, Fine by me. KSES is already breaking the text up in a convenient way for looking for lone less than signs, is all.
#6
@
18 years ago
If you can do it outside of KSES without too much fuss or processing overhead, then we should go that route.
Note for posterity: HTML Purifier doesn't handle this any better than KSES, even though it does offer XHTML well-formedness and validity plus XSS filtering all in one package.
#7
@
18 years ago
Hi, this is the lead developer for HTML Purifier. The upcoming, newest version of HTML Purifier does in fact handle this case gracefully by changing the unescaped < into a literal. For your case, however, with one simple regex:
$html = preg_replace('/<([A-Za-z0-9])/', '<$1', $html);
No mucking around kses necessary. This, however, will turn < br> into < br>
#8
follow-up:
↓ 9
@
18 years ago
Oops, I didn't wrap the code properly. It's really:
$html = preg_replace('/<([^A-Za-z0-9])/', '<$1', $html);
#9
in reply to:
↑ 8
@
18 years ago
Replying to AmbushCommander:
$html = preg_replace('/<([^A-Za-z0-9])/', '<$1', $html);
I don't think that regex is robust enough. "bob<sue" or "<3" would still get caught. Kids say the darndest things.
4409b.diff
- Add
pre_kses
filter to kses (right where it says we should), but rearrange the order slightly (in a way that does not effect kses' efficacy at all). - Add regex to that filter to find and
wp_specialchars()
ize any lone less than signs.
Kses is not modified in any problematic way. Any strings that might have gotten stripped before but now aren't are run through both wp_specialchars and kses, so I don't believe there are any security issues.
#12
@
18 years ago
- Keywords has-patch commit added
- Owner changed from anonymous to mdawaffe
- Status changed from new to assigned
possibility