WordPress.org

Make WordPress Core

Opened 4 years ago

Closed 4 years ago

#11830 closed defect (bug) (fixed)

Link Moderation is Broken

Reported by: miqrogroove Owned by:
Milestone: 3.0 Priority: high
Severity: major Version: 2.7
Component: Comments Keywords: has-patch tested featured
Focuses: Cc:

Description

It's unclear if the patch for #8627 worked on the 2.7 branch. I was only able to test 2.8.4 and 2.9.1 for now. Moderation of links in comments is broken in both branches due to quote slashing. It never works unless I include a bare URL like www.google.com

Attachments (2)

link-moderation.patch (763 bytes) - added by miqrogroove 4 years ago.
Allow slashes before quotes in link moderation (and require space after element name)
11830.diff (740 bytes) - added by Denis-de-Bernardy 4 years ago.

Download all attachments as: .zip

Change History (27)

miqrogroove4 years ago

Allow slashes before quotes in link moderation (and require space after element name)

comment:1 miqrogroove4 years ago

  • Keywords has-patch needs-testing added

comment:2 miqrogroove4 years ago

  • Summary changed from Link Moderation Broken by #8627? to Link Moderation is Broken

comment:3 miqrogroove4 years ago

I think it would be just as effective to use stripslash, but I will wait for feedback on the first patch.

comment:4 Denis-de-Bernardy4 years ago

there are issues in the regexp. see #11833, for starters. this garbage, in particular:

<
a	
<a	
	href	=	
	http://foo:bar@foo.com#?#

<a
	href	=	http://foo.com?\"'

>test</a>

test6

test5

results in the following after getting stripslashed with filters applied:

<
a	
<a	
	href	=	
	http://foo:bar@foo.com#?#

<a>test</a>

test6

test5

<p><<br />
a<br />
<a<br />
	href	=<br />
	<a  href="http://foo:bar@foo.com#?#" rel="nofollow">http://foo:bar@foo.com#?#</a></p>
<p><a>test</a></p>
<p>test6</p>
<p>test5</p>

... so imo, we really ought to harden the beasty a little bit.

as an aside, I'd like to suggest:

  1. we start by stripslashing the mess
  2. we remove_filter whatever code annoys us (I'm looking at you, smileys)
  3. we apply filters
  4. and *then* we regexp the mess with something robust

comment:5 Denis-de-Bernardy4 years ago

err, I'm sorry, that actually is without filters, and then with filters. but I still think the suggestion has its value. ;-)

comment:6 miqrogroove4 years ago

Denis that seems like a good separate bug. Have you found any problem with the moderation patch itself? :P I mean, if your tests are being link-moderated correctly then that would mean the patch works for this ticket.

comment:7 Denis-de-Bernardy4 years ago

I still haven't broken it, no. But I'll try a bit harder tomorrow. It's getting a bit too late for me to think about creative ways to break kses.

comment:8 miqrogroove4 years ago

I just installed and tested a copy of version 2.7.1. Link moderation is also broken in that branch now.

comment:9 miqrogroove4 years ago

  • Version changed from 2.7.1 to 2.7

Setting version to 2.7 since the original patch never worked.

comment:10 miqrogroove4 years ago

  • Keywords tested added; needs-testing removed

Patch has now been tested on versions 2.7.1, 2.8.4, and 2.9.1, for cases with bare URLs, XHTML A elements, and no URLs.

comment:11 miqrogroove4 years ago

  • Keywords bug-hunt added

Denis mentioned he was looking for architectural problems. I think trying to spam filter ambiguously slashed inputs qualifies for that. :)

comment:12 Denis-de-Bernardy4 years ago

  • Keywords featured added; bug-hunt removed

comment:13 nacin4 years ago

  • Milestone changed from 2.9.2 to 3.0

comment:14 nacin4 years ago

(In [13150]) Adjust regex for counting links in a comment when checking if it needs to be held for moderation. Also fix notice in wp_new_comment(). see #11830

comment:15 nacin4 years ago

After talking with miqrogroove, I've committed a simplified regex. We just need to account for anchors that have an href attribute, and anything more and we're grasping at straws.

It looks like the problems pointed out here are not directly related. Another ticket?

comment:16 follow-up: miqrogroove4 years ago

I liked a[^>]+ more than a [^>]*

Let's be sure to check what's going on with unfiltered_html.

comment:17 in reply to: ↑ 16 nacin4 years ago

Replying to miqrogroove:

I liked a[^>]+ more than a [^>]*

Both were improvements in that they require something after the tag name. The latter simply makes sure it's a space right after the tag name.

The former would hypothetically match, say, an <abbr> tag with an href attribute. Not that you'd see that (other than xhtml2).

comment:18 miqrogroove4 years ago

Okay, unfiltered_html users don't run check_comment() at all, so the space is not a concern. kses always provides a space AFAIK.

Problem solved.

Did you still want to keep this open? Any thoughts on #11684?

comment:19 nacin4 years ago

  • Resolution set to fixed
  • Status changed from new to closed

comment:20 Denis-de-Bernardy4 years ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

with the change, we're no longer catching:

<a
href="foo">

the attached patch fixes that.

Denis-de-Bernardy4 years ago

comment:21 Denis-de-Bernardy4 years ago

based on the discussion above, kses *should* sanitize the spaces, but I think we ought to make sure the regexp works even if kses doesn't sanitize the thingy.

comment:22 nacin4 years ago

(In [13801]) Fix notice without breaking comment notifications. See r13150, see #11830

comment:23 dd324 years ago

Denis's patch here does fix the use-case provided. (However only \b is needed,
b is not)

I'd like to see the testdata that this function has been run against however (before/after), its hard to tell what changes are needed here to retain backwards compat.

Take this for example:

<a	
	href	=	
	http://foo:bar@foo.com#?#

<a
	href	=	http://foo.com?\"'

>test</a>

thats detected as one link by the current code. However, if we change it to be non-greedy: '/<a\b[^>]*?href/i' (addition of ? after *) it detects it as 2 links.

comment:24 nacin4 years ago

Looks like this last part is also mentioned in #11833. Alternative would be [<>] instead of [>]. There's a few ways to skin this cat.

This did exist in the old regex, I lifted [>] from there. Is this safe to close as fixed, and handle the rest in the other ticket?

comment:25 nacin4 years ago

  • Resolution set to fixed
  • Status changed from reopened to closed
Note: See TracTickets for help on using tickets.