Opened 15 years ago
Closed 15 years ago
#11830 closed defect (bug) (fixed)
Link Moderation is Broken
Reported by: |
|
Owned by: | |
---|---|---|---|
Milestone: | 3.0 | Priority: | high |
Severity: | major | Version: | 2.7 |
Component: | Comments | Keywords: | has-patch tested featured |
Focuses: | Cc: |
Description
It's unclear if the patch for #8627 worked on the 2.7 branch. I was only able to test 2.8.4 and 2.9.1 for now. Moderation of links in comments is broken in both branches due to quote slashing. It never works unless I include a bare URL like www.google.com
Attachments (2)
Change History (27)
#2
@
15 years ago
- Summary changed from Link Moderation Broken by #8627? to Link Moderation is Broken
#3
@
15 years ago
I think it would be just as effective to use stripslash, but I will wait for feedback on the first patch.
#4
@
15 years ago
there are issues in the regexp. see #11833, for starters. this garbage, in particular:
< a <a href = http://foo:bar@foo.com#?# <a href = http://foo.com?\"' >test</a> test6 test5
results in the following after getting stripslashed with filters applied:
< a <a href = http://foo:bar@foo.com#?# <a>test</a> test6 test5 <p><<br /> a<br /> <a<br /> href =<br /> <a href="http://foo:bar@foo.com#?#" rel="nofollow">http://foo:bar@foo.com#?#</a></p> <p><a>test</a></p> <p>test6</p> <p>test5</p>
... so imo, we really ought to harden the beasty a little bit.
as an aside, I'd like to suggest:
- we start by stripslashing the mess
- we remove_filter whatever code annoys us (I'm looking at you, smileys)
- we apply filters
- and *then* we regexp the mess with something robust
#5
@
15 years ago
err, I'm sorry, that actually is without filters, and then with filters. but I still think the suggestion has its value. ;-)
#6
@
15 years ago
Denis that seems like a good separate bug. Have you found any problem with the moderation patch itself? :P I mean, if your tests are being link-moderated correctly then that would mean the patch works for this ticket.
#7
@
15 years ago
I still haven't broken it, no. But I'll try a bit harder tomorrow. It's getting a bit too late for me to think about creative ways to break kses.
#8
@
15 years ago
I just installed and tested a copy of version 2.7.1. Link moderation is also broken in that branch now.
#9
@
15 years ago
- Version changed from 2.7.1 to 2.7
Setting version to 2.7 since the original patch never worked.
#10
@
15 years ago
- Keywords tested added; needs-testing removed
Patch has now been tested on versions 2.7.1, 2.8.4, and 2.9.1, for cases with bare URLs, XHTML A elements, and no URLs.
#11
@
15 years ago
- Keywords bug-hunt added
Denis mentioned he was looking for architectural problems. I think trying to spam filter ambiguously slashed inputs qualifies for that. :)
#15
@
15 years ago
After talking with miqrogroove, I've committed a simplified regex. We just need to account for anchors that have an href attribute, and anything more and we're grasping at straws.
It looks like the problems pointed out here are not directly related. Another ticket?
#16
follow-up:
↓ 17
@
15 years ago
I liked a[^>]+
more than a [^>]*
Let's be sure to check what's going on with unfiltered_html.
#17
in reply to:
↑ 16
@
15 years ago
Replying to miqrogroove:
I liked
a[^>]+
more thana [^>]*
Both were improvements in that they require something after the tag name. The latter simply makes sure it's a space right after the tag name.
The former would hypothetically match, say, an <abbr> tag with an href attribute. Not that you'd see that (other than xhtml2).
#18
@
15 years ago
Okay, unfiltered_html users don't run check_comment() at all, so the space is not a concern. kses always provides a space AFAIK.
Problem solved.
Did you still want to keep this open? Any thoughts on #11684?
#20
@
15 years ago
- Resolution fixed deleted
- Status changed from closed to reopened
with the change, we're no longer catching:
<a href="foo">
the attached patch fixes that.
#21
@
15 years ago
based on the discussion above, kses *should* sanitize the spaces, but I think we ought to make sure the regexp works even if kses doesn't sanitize the thingy.
#23
@
15 years ago
Denis's patch here does fix the use-case provided. (However only \b is needed,
b is not)
I'd like to see the testdata that this function has been run against however (before/after), its hard to tell what changes are needed here to retain backwards compat.
Take this for example:
<a href = http://foo:bar@foo.com#?# <a href = http://foo.com?\"' >test</a>
thats detected as one link by the current code. However, if we change it to be non-greedy: '/<a\b[^>]*?href/i'
(addition of ? after *) it detects it as 2 links.
Allow slashes before quotes in link moderation (and require space after element name)