Make WordPress Core

Opened 7 months ago

Closed 4 months ago

#61827 closed defect (bug) (fixed)

wp_check_comment_disallowed_list() can't be used to match unprocessed HTML

Reported by: cfinke's profile cfinke Owned by: sergeybiryukov's profile SergeyBiryukov
Milestone: 6.7 Priority: normal
Severity: normal Version: 6.6.1
Component: Comments Keywords: has-patch needs-unit-tests commit dev-reviewed
Focuses: Cc:

Description

If I add the entry href=http to the Disallowed Comment Keys list, it will not send a comment to the Trash even if the original comment matches.

For example, the comment:

<a href=http://example.com/>example</a>

will be put in Pending, since before it is checked by wp_check_comment_disallowed_list(), quotes and attributes are added to it, turning it into:

<a href="http://example.com/" rel="nofollow ugc">example</a>

This means that you couldn't even put this in the Disallowed Comment Keys list and have it work:

<a href="http://example.com/">

since it will be <a href="http://example.com/" rel="nofollow ugc"> by the time it's passed to wp_check_comment_disallowed_list().

Since this behavior appears to be due to the combination of the effects of wp_filter_kses, balanceTags, and wp_rel_ugc rather than a single source, a reasonable solution would seem to be to check the original unmodified comment data against wp_check_comment_disallowed_list() in addition to checking the final filtered comment data.

props to @kbrownkd for discovering the root of this issue.

Attachments (3)

61827.diff (3.7 KB) - added by SergeyBiryukov 4 months ago.
61827.2.diff (4.9 KB) - added by david.binda 4 months ago.
61827.3.diff (5.0 KB) - added by SergeyBiryukov 4 months ago.

Download all attachments as: .zip

Change History (26)

#1 @SergeyBiryukov
7 months ago

  • Keywords needs-patch added
  • Milestone changed from Awaiting Review to 6.7

#2 @mi5t4n
7 months ago

During my testing, I discovered that the wp_check_comment_disallowed_list() method contains a line even when the unaltered comment data is supplied to it, the HTML tags are stripped. So, the above inputs will still not work even if we passed the unaltered comment data.

<?php
File: src/wp-includes/comment.php
1360: 
1361:   // Ensure HTML tags are not being used to bypass the list of disallowed characters and words.
1362:   $comment_without_html = wp_strip_all_tags( $comment );

Any suggestions on how to resolve this?

Last edited 7 months ago by mi5t4n (previous) (diff)

This ticket was mentioned in PR #7155 on WordPress/wordpress-develop by thompsonsj.


7 months ago
#3

  • Keywords has-patch has-unit-tests added; needs-patch removed

When handling a comment submission, run wp_allow_comment on comment data before it is filtered with wp_filter_comment. If the approved status is trash, preserve this status. If not, run wp_allow_comment after comment data is filtered.

This change makes the disallowed list more powerful by looking for matches on unfiltered HTML as well as filtered HTML.

Trac ticket: https://core.trac.wordpress.org/ticket/61827

#4 @thompsonsj
7 months ago

@cfinke @mi5t4n wp_check_comment_disallowed_list() receives filtered comment data. As a result, I checked further up the function calls to find the most appropriate place to run a 'pre-check' on unfiltered comment data.

wp_check_comment_disallowed_list() is called by wp_allow_comment which is run after comment data is filtered inside the wp_new_comment function.

I've suggested a change in https://github.com/WordPress/wordpress-develop/pull/7155 that also runs wp_allow_comment on comment data before it is filtered. If a trash approved status is returned, the comment will keep that status. If not, wp_allow_comment is run on filtered comment data as is the case at the moment.

Unit tests added.

#5 @devspace
6 months ago

This issue arises because the wp_check_comment_disallowed_list() function, which is responsible for checking comments against the list of disallowed keys, processes the HTML content of the comment after it has been filtered. During the filtering, HTML elements are sanitized and attributes like rel="nofollow ugc" are added to links. This modification prevents the original disallowed key from being matched correctly.

As you've identified, the disallowed key, such as href=http, won't trigger the expected behavior because by the time the comment reaches the check, the link attributes have been transformed into something like <a href="http://example.com/" rel="nofollow ugc">.

#6 @devspace
6 months ago

Create a Custom Filter: You could create a custom function that checks the raw comment data before passing it through the normal WordPress filters. Here's an example of how you might implement this:

<?php
add_filter('pre_comment_content', 'custom_check_disallowed_keys', 9, 1);

function custom_check_disallowed_keys($comment_content) {
    $disallowed_keys = array('href=http'); // Add other disallowed keys here
    
    foreach ($disallowed_keys as $key) {
        if (strpos($comment_content, $key) !== false) {
            wp_die('Your comment contains disallowed content.');
        }
    }
    return $comment_content;
}

#7 @SergeyBiryukov
6 months ago

  • Owner set to SergeyBiryukov
  • Status changed from new to reviewing

This ticket was mentioned in Slack in #core by chaion07. View the logs.


4 months ago

#9 @chaion07
4 months ago

Thanks @cfinke for reporting this. We reviewed this Ticket during a recent bug-scrub session. Here's a summary of the feedback received:

  1. The PR is currently pending review by @SergeyBiryukov
  2. Calling the wp_allow_comment twice is not ideal from a performance point of view
  3. We need to check if we can achieve the same result in other ways

Thanks.

Props to @engahmeds3ed

Cheers!

#10 @SergeyBiryukov
4 months ago

  • Resolution set to fixed
  • Status changed from reviewing to closed

In 59267:

Comments: Validate new comments before and after comment data is filtered.

This ensures that a Disallowed Comment Keys match will consistently send the comment to the Trash, by checking both the original unmodified comment data and the final filtered comment data.

If the first check has already resulted in a trash or spam status, the second check is skipped as redundant.

Follow-up to [2894], [3851], [48121], [48575].

Props cfinke, kbrownkd, thompsonsj, mi5t4n, devspace, chaion07, engahmeds3ed, SergeyBiryukov.
Fixes #61827.

@SergeyBiryukov commented on PR #7155:


4 months ago
#11

Thanks for the PR! Merged in r59267.

#12 follow-up: @david.binda
4 months ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

Sorry for re-opening, however, while testing the r59267 I've noticed a possible regression caused by the changeset.

As the wp_allow_comment is now triggered twice, the check_comment_flood action is also triggered twice, which might lead to false-positive identification of comment flood in case there is some custom callback hooked to it, which is not expecting identical data seeing twice.

#13 @desrosj
4 months ago

Thanks @davidbinda! @SergeyBiryukov are you able to take a look at this?

#14 in reply to: ↑ 12 ; follow-up: @SergeyBiryukov
4 months ago

Replying to david.binda:

As the wp_allow_comment is now triggered twice, the check_comment_flood action is also triggered twice, which might lead to false-positive identification of comment flood in case there is some custom callback hooked to it, which is not expecting identical data seeing twice.

Good catch, thanks! Indeed, it's not ideal that wp_allow_comment() is called twice.

I think we can just call wp_check_comment_disallowed_list() directly for the second check instead, e.g.:

if ( wp_check_comment_disallowed_list(
	$commentdata['comment_author'],
	$commentdata['comment_author_email'],
	$commentdata['comment_author_url'],
	$commentdata['comment_content'],
	$commentdata['comment_author_IP'],
	$commentdata['comment_agent']
) ) {
	$commentdata['comment_approved'] = EMPTY_TRASH_DAYS ? 'trash' : 'spam';
}
Last edited 4 months ago by SergeyBiryukov (previous) (diff)

#15 in reply to: ↑ 14 @SergeyBiryukov
4 months ago

Replying to SergeyBiryukov:

I think we can just call wp_check_comment_disallowed_list() directly for the second check instead

Apparently it's a bit more complicated than that, we would need to either replicate this whole block from wp_allow_comment() or move it to a new function that would be called from both places.

#16 @dhrumilk
4 months ago

  • Keywords needs-unit-tests added; has-unit-tests removed

#17 follow-up: @david.binda
4 months ago

I know that my previous comment was about firing the check_comment_flood action twice and thus causing a backward compatibility issue, but now I wonder whether it’s okay for the newly introduced function, the wp_check_comment_data to not have the pre_comment_approved filter incorporated, which means it would run twice :)

The 61827.diff allows filtering the return value of the wp_check_comment_data call from inside the wp_allow_comment, but not the second call on filtered data, which, IMHO, means the default WordPress behaviour will always be used, ignoring a result of any custom callback to the pre_comment_approved filter (except for trash and spam values).

Running the PHPUnit tests with the proposed 61827.diff there are some failures. Moving the pre_comment_approved filter to the wp_check_comment_data function fixes the failures except for one.

IMHO, we should move the pre_comment_approved filter from wp_allow_comment to wp_check_comment_data function.

The remaining failure in Tests_XMLRPC_wp_newComment::test_new_comment_duplicated seems to be due to the WP_Error return check from the wp_allow_comment which is now happening only after the newly introduced wp_check_comment_data function, which does not seem to be returning a WP_Error and is overriding the duplicate comment situation.

I’m attaching a diff addressing my concerns mentioned above. To sum up the changes in between the 61827.diff and the new diff:

  1. move pre_comment_approved filter to wp_check_comment_data
  2. check if wp_allow_comment returns a WP_Error and if so, return it early from the wp_new_comment before we even get to the wp_check_comment_data function call.

@david.binda
4 months ago

This ticket was mentioned in Slack in #core by stoyangeorgiev. View the logs.


4 months ago

#19 in reply to: ↑ 17 @SergeyBiryukov
4 months ago

Replying to david.binda:

I’m attaching a diff addressing my concerns mentioned above.

61827.2.diff looks great to me, thank you!

In 61827.3.diff, I've only adjusted the @return tag for wp_check_comment_data() to include WP_Error as a possible return value.

#20 @SergeyBiryukov
4 months ago

In 59319:

Comments: Use a more precise check for disallowed keys on filtered comment data.

The previous approach of running wp_allow_comment() twice could have unintended consequences, e.g. the check_comment_flood action was also triggered twice, which might lead to false-positive identification of comment flood in case there is some custom callback hooked to it, which is not expecting identical data seeing twice.

This commit introduces a new function, wp_check_comment_data(), to specifically check for disallowed content before and after comment data is filtered.

Follow-up to [59267].

Props david.binda, SergeyBiryukov.
See #61827.

#21 @SergeyBiryukov
4 months ago

  • Keywords commit dev-feedback added

Marking [59319] for backporting to the 6.7 branch after a second committer's review.

#22 @davidbaumwald
4 months ago

  • Keywords dev-reviewed added; dev-feedback removed

Looks good to merge to the 6.7 branch.

#23 @SergeyBiryukov
4 months ago

  • Resolution set to fixed
  • Status changed from reopened to closed

In 59322:

Comments: Use a more precise check for disallowed keys on filtered comment data.

The previous approach of running wp_allow_comment() twice could have unintended consequences, e.g. the check_comment_flood action was also triggered twice, which might lead to false-positive identification of comment flood in case there is some custom callback hooked to it, which is not expecting identical data seeing twice.

This commit introduces a new function, wp_check_comment_data(), to specifically check for disallowed content before and after comment data is filtered.

Follow-up to [59267].

Reviewed by davidbaumwald.
Merges [59319] to the 6.7 branch.

Props david.binda, SergeyBiryukov.
Fixes #61827.

Note: See TracTickets for help on using tickets.