Make WordPress Core

Opened 15 years ago

Closed 15 years ago

Last modified 15 years ago

#11684 closed feature request (fixed)

Comment Moderation Should Count $url

Reported by: miqrogroove's profile miqrogroove Owned by:
Milestone: 3.0 Priority: normal
Severity: major Version:
Component: Comments Keywords: has-patch
Focuses: Cc:

Description

One of the biggest failings of the WordPress moderation system is that the comment_max_links option has never worked properly. In older versions, it did not work at all. In 2.8, it seems to only count the links in $comment correctly, so all the spams with links in $url don't get moderated.

I've had to hack every version of comment.php to fix this, and it would be quite nice if it just worked out of the box.

Attachments (4)

comment-moderation.patch (954 bytes) - added by miqrogroove 15 years ago.
comment-moderation.2.patch (918 bytes) - added by miqrogroove 15 years ago.
comment-moderation.3.patch (904 bytes) - added by scribu 15 years ago.
Use the 'i' (case-insensitive) regex modifier
11684.diff (931 bytes) - added by Denis-de-Bernardy 15 years ago.
untested

Download all attachments as: .zip

Change History (31)

@scribu
15 years ago

Use the 'i' (case-insensitive) regex modifier

#1 @scribu
15 years ago

  • Keywords has-patch added

#2 @Denis-de-Bernardy
15 years ago

+1, but it's actually wrong to expect the anchor tag to be formatted that way. all browsers accept the following garbage, for instance:

<a
href
=
http://foo.com
>bar</a>

and I vaguely remember one or two coping with white space between the lesser than and the a:

<
a href="foo">bar</a>

so, a better regexp might be:

$links = preg_match_all("/<\s*a[^>]*\shref\s*=[^>]*>/", apply_filters('comment_text',$comment));

@Denis-de-Bernardy
15 years ago

untested

#3 follow-up: @miqrogroove
15 years ago

I'm not completely liking the regexp change. Would it be reasonable to normalize links before moderation checking? Does the comment_text filter already do that? In other words, things like "www.mysite.com" are being converted by WordPress, and that's where other conversions could/should also happen.

#4 @miqrogroove
15 years ago

add_filter('comment_text', 'wptexturize');
add_filter('comment_text', 'convert_chars');
add_filter('comment_text', 'make_clickable', 9);
add_filter('comment_text', 'force_balance_tags', 25);
add_filter('comment_text', 'convert_smilies', 20);
add_filter('comment_text', 'wpautop', 30);

In other words, let's take a look at what's going on in make_clickable.

#5 @miqrogroove
15 years ago

Denis, when I put your example link into a comment, this is what I get back:

<a href="http://foo.com" rel="nofollow">bar</a>

Seems to be a non-issue.

+1 Scribu's version unless there are other concerns.

#6 in reply to: ↑ description @filosofo
15 years ago

Replying to miqrogroove:

One of the biggest failings of the WordPress moderation system is that the comment_max_links option has never worked properly. In older versions, it did not work at all. In 2.8, it seems to only count the links in $comment correctly, so all the spams with links in $url don't get moderated.

I've had to hack every version of comment.php to fix this, and it would be quite nice if it just worked out of the box.

I don't understand why it would be desirable to penalize users for filling out the URL field. Filling out the URL field is suggested by its very presence in the comment form, and a significant about of ham comments do; in other words, I don't think its use correlates well with being spam. In contrast, a high number of links in the comment text itself does seem to correlate with the comment's being spam. (But either way this is a clumsy method of catching spam.)

How about something like this:

if ( apply_filters('check_comment_max_links', 
   ( 
      get_option('comment_max_links') && 
      count([array of links]) >=  get_option('comment_max_links') 
   ), 
   [array of links], 
   [comment text], 
   [commenter's url],  
   [commenter's name or whatever]) 
) {
   return false;
}

And then you can get a callback to do all kinds of fun stuff if you want.

#7 in reply to: ↑ 3 @filosofo
15 years ago

Replying to miqrogroove:

I'm not completely liking the regexp change. Would it be reasonable to normalize links before moderation checking? Does the comment_text filter already do that? In other words, things like "www.mysite.com" are being converted by WordPress, and that's where other conversions could/should also happen.

Yes, www.mysite.com is being converted into an actual link by make_clickable, so really preg_match_all could just count instances of <a href=, for example, in the comment text.

#8 follow-up: @miqrogroove
15 years ago

I don't think its use correlates well with being spam.

This thread isn't about spam sir ;) It's about moderating comments that have links in them. The existing feature, as advertised, is supposed to hold comments that contain N links in them, and IMO it does not do that. You could certainly argue that this should be a separate option and/or filterable, but the existing feature still wont do what it says it does.

so really preg_match_all could just count instances of <a href=

I'm still trying to figure out how/why Denis's example link gets normalized. The flow of control in there is incredibly hairy.

#9 in reply to: ↑ 8 @filosofo
15 years ago

Replying to miqrogroove:

This thread isn't about spam sir ;) It's about moderating comments that have links in them. The existing feature, as advertised, is supposed to hold comments that contain N links in them, and IMO it does not do that. You could certainly argue that this should be a separate option and/or filterable, but the existing feature still wont do what it says it does.

First of all, the option in question explicitly says that its purpose is for spam-catching:

Hold a comment in the queue if it contains X or more links. 
(A common characteristic of comment spam is a large number of hyperlinks.)

Second, you haven't addressed my point that the URL is not part of the comment. It's meta-data about the comment, not "contained" in the comment itself. So it does what it claims to do, assuming the effectiveness of the regex in matching links within the comment text.

#10 follow-up: @miqrogroove
15 years ago

you haven't addressed my point that the URL is not part of the comment.

The option below this one directly contradicts that idea. It even goes out of its way to say that the content field is included as part of the comment moderation filter. You have to admit that's confusing.

Denis, turns out your test case is handled by kses_init_filters(), which hooks pre_comment_content. So I think we're safe to ignore funky linefeed stuff.

#11 in reply to: ↑ 10 @filosofo
15 years ago

Replying to miqrogroove:

The option below this one directly contradicts that idea. It even goes out of its way to say that the content field is included as part of the comment moderation filter. You have to admit that's confusing.

Sure. So how about changing the label to say something like

Hold a comment in the queue if its content contains X or more links. 

and adding a filter so someone like you can take into account URL if he wants?

#12 @miqrogroove
15 years ago

and adding a filter so someone like you can take into account URL if he wants?

It's a good idea but I'd say it's inadequate. Unless there's an option to moderate all URLs, I'm still hacking on code that doesn't do anything for me.

This request now depends on #11830.

#13 @nacin
15 years ago

Based on the wording as it stands in options-discussion.php, I think including $url in the count could be confusing. We also have a very low tolerance of 2 as a default, which means a single link and a comment author URL suddenly snags a comment.

I agree that the comment author URL is likely to contain spam just like the comment text, so I do want to account for it in some way. You said you found a filter inadequate here?

#14 @miqrogroove
15 years ago

filter inadequate here?

Yes, 2 is far too many links for my taste. Blog comments are not a spam platform or a mechanism for people to come and promote their irrelevant content. I set the threshhold at 1 and if someone wants to post 1 link in $url then I will happily moderate it as I see fit. Otherwise, visitors are free to supply their name and post their (unmoderated) comments without links in them. I have to hack the system to make that work though.

#15 @nacin
15 years ago

I'm saying we could so something like this.

	// Check # of hyperlinks
	if ( get_option( 'comment_max_links' ) ) {
		$num_links = preg_match_all( '/<a [^>]*href/i', apply_filters( 'comment_text', $comment ), $out );
		if ( apply_filters( 'comment_max_links_include_url', false ) && url ) // Count URL
			$num_links++;
		if ( $num_links >= get_option( 'comment_max_links' ) )
			return false;
	}

The alternative is to omit the filter entirely (or s/false/true/), and have it count the $url by default.

I agree we should count $url in theory, but since we haven't had it so long and the default is only 2, it would make a big difference to add it. Maybe a string change in options-discussion.php. Another option is a "Include comment author URL" checkbox but that would just complicate an already complicated form.

That all said, link moderation was partially broken since 2.7 (#11830). Might as well come up with a new strategy now.

#16 @miqrogroove
15 years ago

Random idea: if ( 0 == get_option( 'comment_max_links' ) && url ) Count URL

#17 @miqrogroove
15 years ago

I will digress from comment 12. The benefit of a filter is that a plugin only needs to be written once, whereas a hack has to be done on every upgrade. I'd prefer to tick a check box, but I'll take what I can get too.

#18 @miqrogroove
15 years ago

how about:

apply_filters( 'comment_max_links_website', $num_links, $url );

#19 @nacin
15 years ago

I can go for that, but it seems restrictive (but not by design, unlike the other one). After all, you can filter on get_option('comment_max_links') if you wanted to.

I'm going to bring this up with a few others. I wouldn't be vehemently opposed to a "Include comment author URL" checkbox, either.

#20 @Denis-de-Bernardy
15 years ago

is it me, or a fix (or related fix) got committed for this a few days or weeks back?

#21 @dd32
15 years ago

is it me, or a fix (or related fix) got committed for this a few days or weeks back?

You'd be thinking of [13150] i think

#23 follow-up: @xibe
15 years ago

Should this ticket be closed, then?

#24 in reply to: ↑ 23 @nacin
15 years ago

Replying to xibe:

Should this ticket be closed, then?

We spoke about this in IRC a week or two ago. I'm personally unopposed to a checkbox, but adding new options are generally frowned upon.

At the very least, before this gets punted, I will add a filter to allow this to occur (probably not by default).

#25 @nacin
15 years ago

[13513] provides a filter to allow including comment author's URL in comment moderation link count.

#26 @nacin
15 years ago

  • Resolution set to fixed
  • Status changed from new to closed

There's been no agreement on this beyond the filter. Closing as fixed.

#27 @hakre
15 years ago

If that count mechanism is seen as fail there is no way to re-count with and own implementation because a filter for that is missing.

I opened a new ticket for that: #14681

Note: See TracTickets for help on using tickets.