Opened 3 years ago

Closed 3 years ago

#11830 closed defect (bug) (fixed)

Link Moderation is Broken

Reported by: miqrogroove Owned by:
Priority: high Milestone: 3.0
Component: Comments Version: 2.7
Severity: major Keywords: has-patch tested featured
Cc:

Description

It's unclear if the patch for #8627 worked on the 2.7 branch. I was only able to test 2.8.4 and 2.9.1 for now. Moderation of links in comments is broken in both branches due to quote slashing. It never works unless I include a bare URL like www.google.com

Attachments (2)

link-moderation.patch (763 bytes) - added by miqrogroove 3 years ago.
Allow slashes before quotes in link moderation (and require space after element name)
11830.diff (740 bytes) - added by Denis-de-Bernardy 3 years ago.

Download all attachments as: .zip

Change History (27)

Allow slashes before quotes in link moderation (and require space after element name)

  • Keywords has-patch needs-testing added
  • Summary changed from Link Moderation Broken by #8627? to Link Moderation is Broken

I think it would be just as effective to use stripslash, but I will wait for feedback on the first patch.

there are issues in the regexp. see #11833, for starters. this garbage, in particular:

<
a	
<a	
	href	=	
	http://foo:bar@foo.com#?#

<a
	href	=	http://foo.com?\"'

>test</a>

test6

test5

results in the following after getting stripslashed with filters applied:

<
a	
<a	
	href	=	
	http://foo:bar@foo.com#?#

<a>test</a>

test6

test5

<p><<br />
a<br />
<a<br />
	href	=<br />
	<a  href="http://foo:bar@foo.com#?#" rel="nofollow">http://foo:bar@foo.com#?#</a></p>
<p><a>test</a></p>
<p>test6</p>
<p>test5</p>

... so imo, we really ought to harden the beasty a little bit.

as an aside, I'd like to suggest:

  1. we start by stripslashing the mess
  2. we remove_filter whatever code annoys us (I'm looking at you, smileys)
  3. we apply filters
  4. and *then* we regexp the mess with something robust

err, I'm sorry, that actually is without filters, and then with filters. but I still think the suggestion has its value. ;-)

Denis that seems like a good separate bug. Have you found any problem with the moderation patch itself? :P I mean, if your tests are being link-moderated correctly then that would mean the patch works for this ticket.

I still haven't broken it, no. But I'll try a bit harder tomorrow. It's getting a bit too late for me to think about creative ways to break kses.

I just installed and tested a copy of version 2.7.1. Link moderation is also broken in that branch now.

  • Version changed from 2.7.1 to 2.7

Setting version to 2.7 since the original patch never worked.

  • Keywords tested added; needs-testing removed

Patch has now been tested on versions 2.7.1, 2.8.4, and 2.9.1, for cases with bare URLs, XHTML A elements, and no URLs.

  • Keywords bug-hunt added

Denis mentioned he was looking for architectural problems. I think trying to spam filter ambiguously slashed inputs qualifies for that. :)

  • Keywords featured added; bug-hunt removed
  • Milestone changed from 2.9.2 to 3.0

(In [13150]) Adjust regex for counting links in a comment when checking if it needs to be held for moderation. Also fix notice in wp_new_comment(). see #11830

After talking with miqrogroove, I've committed a simplified regex. We just need to account for anchors that have an href attribute, and anything more and we're grasping at straws.

It looks like the problems pointed out here are not directly related. Another ticket?

comment:16 follow-up: ↓ 17   miqrogroove3 years ago

I liked a[^>]+ more than a [^>]*

Let's be sure to check what's going on with unfiltered_html.

comment:17 in reply to: ↑ 16   nacin3 years ago

Replying to miqrogroove:

I liked a[^>]+ more than a [^>]*

Both were improvements in that they require something after the tag name. The latter simply makes sure it's a space right after the tag name.

The former would hypothetically match, say, an <abbr> tag with an href attribute. Not that you'd see that (other than xhtml2).

Okay, unfiltered_html users don't run check_comment() at all, so the space is not a concern. kses always provides a space AFAIK.

Problem solved.

Did you still want to keep this open? Any thoughts on #11684?

  • Resolution set to fixed
  • Status changed from new to closed
  • Resolution fixed deleted
  • Status changed from closed to reopened

with the change, we're no longer catching:

<a
href="foo">

the attached patch fixes that.

based on the discussion above, kses *should* sanitize the spaces, but I think we ought to make sure the regexp works even if kses doesn't sanitize the thingy.

(In [13801]) Fix notice without breaking comment notifications. See r13150, see #11830

Denis's patch here does fix the use-case provided. (However only \b is needed,
b is not)

I'd like to see the testdata that this function has been run against however (before/after), its hard to tell what changes are needed here to retain backwards compat.

Take this for example:

<a	
	href	=	
	http://foo:bar@foo.com#?#

<a
	href	=	http://foo.com?\"'

>test</a>

thats detected as one link by the current code. However, if we change it to be non-greedy: '/<a\b[^>]*?href/i' (addition of ? after *) it detects it as 2 links.

Looks like this last part is also mentioned in #11833. Alternative would be [<>] instead of [>]. There's a few ways to skin this cat.

This did exist in the old regex, I lifted [>] from there. Is this safe to close as fixed, and handle the rest in the other ticket?

  • Resolution set to fixed
  • Status changed from reopened to closed
Note: See TracTickets for help on using tickets.