Make WordPress Core

Opened 3 years ago

Last modified 12 months ago

#54488 assigned defect (bug)

wp_filter_nohtml_kses does not remove HTML comments

Reported by: leewillis77's profile leewillis77 Owned by: audrasjb's profile audrasjb
Milestone: Future Release Priority: normal
Severity: normal Version: 2.1
Component: Formatting Keywords: has-patch has-unit-tests 2nd-opinion changes-requested
Focuses: docs Cc:

Description

The documentation states that wp_filter_nohtml_kses()

"Strips all HTML from a text string."

However, in reality, HTML comments are preserved. This seems to be an explicit choice (wp_kses_split2() - L1083 of wp-includes/kses.php but seems at odds with the documentation, and also with the expectations of a function named "nohtml".

Expected behaviour

wp> wp_filter_nohtml_kses('<!-- comment -->This is not a comment');
=> string(21) "This is not a comment"

Actual behaviour

wp> wp_filter_nohtml_kses('<!-- comment -->This is not a comment');
=> string(37) "<!-- comment -->This is not a comment"

Attachments (2)

54488-unit-tests.diff (3.0 KB) - added by johnregan3 2 years ago.
Unit Tests based on PR #2275
54488.updated.diff (5.1 KB) - added by johnregan3 2 years ago.
Refreshed patch for merging, originally by konradyoast

Download all attachments as: .zip

Change History (34)

#1 @audrasjb
3 years ago

  • Component changed from Security to Formatting
  • Focuses docs added
  • Milestone changed from Awaiting Review to 6.0
  • Owner set to audrasjb
  • Status changed from new to assigned
  • Version changed from trunk to 2.1

Thank you for opening this ticket @leewillis77.
Self assigning the ticket. I'll move it to milestone 6.0.
I'm also removing the trunk version since it was introduced in 2.1 and not in WP 5.9 :)

This ticket was mentioned in PR #2275 on WordPress/wordpress-develop by konradyoast.


3 years ago
#2

  • Keywords has-patch added

This ticket was mentioned in Slack in #core by costdev. View the logs.


2 years ago

#4 @costdev
2 years ago

  • Keywords needs-unit-tests added
  • Milestone changed from 6.0 to 6.1

After discussing this ticket in the bug scrub, I'm moving this to the 6.1 milestone and adding the needs-unit-tests keyword.

This ticket was mentioned in Slack in #core-test by geisthanen. View the logs.


2 years ago

@johnregan3
2 years ago

Unit Tests based on PR #2275

#6 @johnregan3
2 years ago

To be clear, [2275 refers to the PR on Github](https://github.com/WordPress/wordpress-develop/pull/2275).

johnregan3 commented on PR #2275:


2 years ago
#7

I've added some unit tests based on the code in this PR -> 5488-unit-tests.diff

See also https://core.trac.wordpress.org/ticket/54488

@johnregan3
2 years ago

Refreshed patch for merging, originally by konradyoast

#8 @johnregan3
2 years ago

  • Keywords needs-unit-tests removed

I've refreshed the original patch and removed the needs-unit-tests keyword

This ticket was mentioned in PR #3370 on WordPress/wordpress-develop by audrasjb.


2 years ago
#9

  • Keywords has-unit-tests added

#10 @audrasjb
2 years ago

Thanks for the patch @johnregan3! The above PR put together this patch and the unit tests patch.

#11 @audrasjb
2 years ago

  • Milestone changed from 6.1 to 6.2

With WP 6.1 RC 1 scheduled today (Oct 11, 2022), there is not much time left to address this ticket. Let's move it to the next milestone as it still needs testing.

This ticket was mentioned in Slack in #core by costdev. View the logs.


20 months ago

#13 @audrasjb
20 months ago

  • Keywords needs-testing added

Testing instructions:

<?php
echo wp_filter_nohtml_kses('<!-- This is a comment -->This is not a comment');

Expected result: This is not a comment
Actual behaviour before applying the patch: <!-- This is a comment -->This is not a comment

Last edited 20 months ago by audrasjb (previous) (diff)

#14 @Dharm1025
20 months ago

  • Keywords needs-testing removed

Test Report

This report validates that the indicated patch addresses the issue.

Patch tested: https://github.com/WordPress/wordpress-develop/pull/3370

Environment

  • OS: macOS Ventura 13.0
  • Web Server: nginx/1.23.3
  • PHP: 7.4.33
  • WordPress: 6.1-beta2-54337-src
  • Browser: Version 109.0.5414.119 (Official Build) (arm64)
  • Theme: Twenty Twenty-Three
  • Active Plugins: -

Test Results

✅ Works as expected with a patch.

Hi @audrasjb, Thanks for the testing instructions. I have tested the patch as per testing instructions and it works as expected.

Before Patch:

<?php 

echo wp_filter_nohtml_kses('<!-- This is a comment --><p>This is not a comment</p>');
// Output: <!-- This is a comment -->This is not a comment


echo wp_filter_nohtml_kses('<!--This is a comment--><p>This is not a comment</p>');
// Output: <!--This is a comment-->This is not a comment


echo wp_filter_nohtml_kses('<p>This is not a comment<!-- This is a comment --></p>');
// Output: This is not a comment<!-- This is a comment -->


echo wp_filter_nohtml_kses('<p>This is not a comment <!-- <a href="https://wordpress.org">WordPress.org</a> --> </p>');
// Output: This is not a comment &lt;!-- WordPress.org --&gt; 


echo wp_filter_nohtml_kses('<p>This is not a comment <!-- This is a comment</p> -->');
// Output: This is not a comment &lt;!-- This is a comment --&gt;


echo wp_filter_nohtml_kses('<!--
                                This is a comment
                                this is a second line
                                this is a third line
                                -->This is not a comment');
// Output: <!--
//                              This is a comment
//                              this is a second line
//                              this is a third line
//                              -->This is not a comment

After Patch:

<?PHP

echo wp_filter_nohtml_kses('<!-- This is a comment --><p>This is not a comment</p>');
// Output: This is not a comment


echo wp_filter_nohtml_kses('<!--This is a comment--><p>This is not a comment</p>');
// Output: This is not a comment


echo wp_filter_nohtml_kses('<p>This is not a comment<!-- This is a comment --></p>');
// Output: This is not a comment


echo wp_filter_nohtml_kses('<p>This is not a comment <!-- <a href="https://wordpress.org">WordPress.org</a> --> </p>');
// Output: This is not a comment &lt;!-- WordPress.org --&gt; 


echo wp_filter_nohtml_kses('<p>This is not a comment <!-- This is a comment</p> -->');
// Output: This is not a comment &lt;!-- This is a comment --&gt;


echo wp_filter_nohtml_kses('<!--
                                This is a comment
                                this is a second line
                                this is a third line
                                -->This is not a comment');
// Output: This is not a comment

Thanks

Last edited 20 months ago by Dharm1025 (previous) (diff)

This ticket was mentioned in Slack in #core by costdev. View the logs.


20 months ago

#16 in reply to: ↑ description ; follow-up: @SergeyBiryukov
20 months ago

Thanks for the PR!

The new $allowed_comments parameter seems a bit confusing to me. Compared with $allowed_html and $allowed_protocols, it sounds like it would allow different types of comments, but that is not the case.

Taking a step back and looking at the ticket description:

The documentation states that wp_filter_nohtml_kses()

"Strips all HTML from a text string."

However, in reality, HTML comments are preserved. This seems to be an explicit choice (wp_kses_split2() - L1083 of wp-includes/kses.php but seems at odds with the documentation, and also with the expectations of a function named "nohtml".

Should the documentation perhaps be adjusted instead to note that comments are preserved and this is an explicit choice? I believe that can still be done for 6.2.

Last edited 20 months ago by SergeyBiryukov (previous) (diff)

#17 in reply to: ↑ 16 @azaozz
20 months ago

  • Keywords 2nd-opinion added

Replying to SergeyBiryukov:

The new $allowed_comments parameter seems a bit confusing to me.

Same here.

Should the documentation perhaps be adjusted instead to note that comments are preserved and this is an explicit choice? I believe that can still be done for 6.2.

+1. Also perhaps mention that there is a wp_strip_all_tags() that is (most likely) quite faster, see: https://developer.wordpress.org/reference/functions/wp_strip_all_tags/.

#18 follow-up: @leewillis77
20 months ago

I can understand how the proposed name ($allowed_comments) may be confusing in the context of other similarly named. Would $preserve_comments be more appropriate?

@azaozz is wp_strip_all_tags() functionally equivalent to a (fixed) wp_filter_nohtml_kses() as modified here? I'm guessing not, but if not, what are the differences?

Or, if the two are functionally equivalent then I'd suggest we need a new ticket to deprecate one of them in favour of the preferred option?

#19 in reply to: ↑ 18 @azaozz
20 months ago

Replying to leewillis77:

is wp_strip_all_tags() functionally equivalent to a (fixed) wp_filter_nohtml_kses() as modified here?

Yes, think so. wp_strip_all_tags() uses PHP's strip_tags() and removed the content from script and style tags.

Seems wp_filter_nohtml_kses() was introduced in 2006 (17 years ago!) as a shortcut to wp_filter_kses() with no allowed tags. That should sanitize the passed HTML, and remove any disallowed tags (KSES is an HTML sanitization library). Guessing HTML comments were not considered to be dangerous in KSES.

deprecate one of them in favour of the preferred option

Yep, thinking wp_filter_nohtml_kses() can probably be deprecated. I see a fair amount of plugins use it. They probably copied the use from previous versions of WP. My guess is they continue to use it because it accepts direct input (with slashes). However this function is no longer used in core.

#20 @leewillis77
20 months ago

OK, so the two functions are different:

wp> wp_filter_nohtml_kses($input);
=> string(45) "body { color: red; }<!-- Comment -->Some text"
wp> wp_strip_all_tags($input);
=> string(9) "Some text"

Even with the changes to the behaviour of wp_filter_nohtml_kses proposed here the two would still produce different results as the kses version does not remove the content "in" style/script tags?

#21 follow-up: @peterwilsoncc
20 months ago

One functional difference I've noticed with kses is that it's possible to filter the allowed HTML for the strip context:

case 'strip':
	/** This filter is documented in wp-includes/kses.php */
	return apply_filters( 'wp_kses_allowed_html', array(), $context );

-- source code

Were the kses approach to be deprecated in favour of wp_strip_all_html() then I'd suggest modifying wp_kses() to use the function if the allowed HTML is an empty array.

Reading the comments above I see two changes in behaviour:

  • comments are also removed
  • the content of script and style tags would be removed

I think these changes are acceptable provided they're included in the field guide/miscellaneous dev notes for the appropriate release. In my view it better matches expectations.

#22 @peterwilsoncc
20 months ago

This is one of those tickets that falls neatly halfway between a bug and enhancement. Given WordPress 6.2 is in the late stages of beta, it might be worth waiting for 6.3 to action this.

It looks like there is a little to discuss too.

cc @costdev and @mukesh27 as triage leads for 6.2.

#23 @costdev
20 months ago

  • Milestone changed from 6.2 to 6.3

Thanks for the ping @peterwilsoncc!

Agreed, there's no need for us to rush this so close to RC when the Version is 2.1, the ticket was created several cycles ago, and ideas still need feedback and consensus.

The ticket has moved forward this cycle - let's get it to resolution in 6.3.

#24 in reply to: ↑ 21 @azaozz
20 months ago

Replying to peterwilsoncc:

One functional difference I've noticed with kses is that it's possible to filter the allowed HTML for the strip context

Yes, also a plugin may return an empty array from any of the instances of the wp_kses_allowed_html filter.

Were the kses approach to be deprecated in favour of wp_strip_all_tags() then I'd suggest modifying wp_kses() to use the function if the allowed HTML is an empty array.

+1. Then perhaps there will be no need to deprecate wp_filter_nohtml_kses(). Only change the docs to explain that it is a shortcut for wp_kses() without any allowed tags and the latter uses wp_strip_all_tags() in that case.

there's no need for us to rush this so close to RC

Yes, seems this needs a bit more to get it right.

This ticket was mentioned in Slack in #core by chaion07. View the logs.


16 months ago

#26 @oglekler
15 months ago

  • Keywords needs-refresh added
  • Milestone changed from 6.3 to 6.4

At a glance, the solution looks resonable, but the patch cannot be applied and needs to be refreshed. Due to this and lack of activity lately, I am moving this ticket into the 6.4 milestone.

This ticket was mentioned in Slack in #core by oglekler. View the logs.


13 months ago

#28 @oglekler
13 months ago

  • Keywords changes-requested added; needs-refresh removed

This ticket was discussed during a bug scrub.

@audrasjb can you please address the latest comments? It looks like there is still a little work to do. Thank you!

Add props @mukesh27

#29 @nicolefurlan
12 months ago

@audrasjb with RC1 for 6.4 ~2 weeks away, what do you think about this ticket? Could you weigh in on the latest comments and determine whether a fix will be included in 6.4?

#30 @nicolefurlan
12 months ago

Per #comment:24:

Yes, seems this needs a bit more to get it right.

It seems that this ticket will likely not make it into the 6.4 release cycle. I think we should move this to Future Release. (cc @audrasjb)

This ticket was mentioned in Slack in #core by oglekler. View the logs.


12 months ago

#32 @oglekler
12 months ago

  • Milestone changed from 6.4 to Future Release

It looks like an obvious thing to do, but because we had no activity lately, I am moving this into Future release When the patch will be ready for commit, it can be moved back into the next available milestone.

Note: See TracTickets for help on using tickets.