WordPress.org

Make WordPress Core

Opened 2 years ago

Closed 4 months ago

#44988 closed defect (bug) (duplicate)

The sanitize_html_class() is deceptive / "buggy"

Reported by: ChiefAlchemist Owned by:
Milestone: Priority: normal
Severity: normal Version: 4.9.6
Component: Formatting Keywords:
Focuses: Cc:

Description

Best I can tell, sanitize_html_class() will return a class name that begins with an invalid character (e.g., an integer). AFAIK a class can't begin with an integer ( as well as a handful of other special / odd characters).

Mind you, I understand sanitation is not validation :) That said, the function description says "If this results in an empty string, then the function will return the alternative value supplied."

If you interpret that as "if your class is invalid..." (as I did), then at the very least some level of validation is implied.

Long to short, if the function returns a useless class, there's (almost) no point in having a special sanitizer. Sure, it might sanitize - I can hear the sec team barking - but the result isn't any more useful than using a "normal" sanitize function.

Attachments (1)

44988-broken.patch (2.2 KB) - added by ayeshrajans 2 years ago.
This breaks tests: https://travis-ci.org/Ayesh/wordpress-develop/builds/434524930

Download all attachments as: .zip

Change History (9)

#1 @mukesh27
2 years ago

  • Component changed from General to Formatting

#2 @ayeshrajans
2 years ago

The W3 spec does say that class names cannot start with a digit, single or double hyphens.
https://www.w3.org/TR/CSS21/syndata.html#characters

We can either remove the invalid characters or replace them. I'm leaning towards replacing them with an underscore because we cal still salvage an invalid class name from it (12345 would become '_2345' instead of '').

$sanitized = preg_replace( array(
                '/^[0-9]/',
                '/^(-[0-9])|^(--)/',
        ), array(
                '_',
                '__',
        ), $sanitized);

Backwards compatibility should not be issue because invalid classes are dropped by browsers anyway: https://jsfiddle.net/4om9bqhp/

I will attach a patch in next reply, but it raises some more issues...

#3 @ayeshrajans
2 years ago

The patch above (which fixes the class name according to W3 spec) breaks the tests including \Tests_Post_GetPostClass::test_with_utf8_category_slugs. This is because \get_post_class function uses this function to specifically get numeric values sanitized although the return values break W3 spec.

We could of course fix the failing tests, but this will clearly break BC. One way to fix this would be to introduce an additional boolean parameter to the function that only enforces stricter validations only when explicitly required so. Ugly solution for a minimal issue.

#4 follow-up: @lgedeon
5 months ago

If another core function is using sanitize_html_class() to get something that is not valid for an html class, then that function is in error and should be fixed.

If a significant number of plugins or themes outside core are relying on sanitize_html_class() being wrong, however, then we might just have to leave this alone.

If we can fix this though, I would prefer that the function add an underscore to the front rather than replace a character, if the only place the character is invalid is that it is in front.

#5 in reply to: ↑ 4 ; follow-up: @ChiefAlchemist
5 months ago

Replying to lgedeon:

If a significant number of plugins or themes outside core are relying on sanitize_html_class() being wrong, however, then we might just have to leave this alone.

Can you give an example of this? Ultimately, the issue here is, that a class is passing "validation" that is not valid. How would someone go about using something that's "against the law"? And if so, is that a good thing?

That said, an arg (bool?) could be added to the function such that the default is the current but the opposite would be able to actually do what they function says it does.

Now we have:

sanitize_html_class( $my_string)

Next, we could have:

sanitize_html_class( $my_string, true)

Where true is the new & improved full-powered function, and the default of false bypasses the new fix. This should maintain backward compatibility and also allow the function to do what it promised to do. The extra arg is friction but it's better than such an obvious bug remaining in core.

#6 in reply to: ↑ 5 @lgedeon
4 months ago

Replying to ChiefAlchemist:

Replying to lgedeon:

If a significant number of plugins or themes outside core are relying on sanitize_html_class() being wrong, however, then we might just have to leave this alone.

Can you give an example of this?

Sorry, I can't give an example of that. I don't actually know if that is the case. If we really want to know, I have heard of people doing a search of all themes and plugins in the public repository to see how common a use case is.

My point was simply that this is an obvious error, but that fixing it might break a lot of other people's code. So, we would need to check first before making this change.

I don't like the idea of adding a parameter, though. We shouldn't have to add a parameter to make it do the right thing.

Maybe we can put in a deprecation notice for when the function is used with an invalid (but currently allowed) character. The deprecation notice could say something like, "The value supplied will be handled differently in the next version of WordPress. Starting with version X.X this function will return "_zzzzz-zzz" for the given input value.

#7 @davidwebca
4 months ago

Hello! I came here searching about the escaping mecanism for classes added through the menu items in the admin. I found myself surprised to see colons being removed altogether. With the prevalence and popularity of tailwindcss.com, we should maybe reconsider what is allowed as a special character in the sanitize_html_class and it could be part of the same discussion you guys have been having for quite a while. Here's an additional link about the specificity of what is "allowed" in CSS. https://mathiasbynens.be/notes/css-escapes

#8 @peterwilsoncc
4 months ago

  • Milestone Awaiting Review deleted
  • Resolution set to duplicate
  • Status changed from new to closed

I missed this been opened a couple of years ago, ticket #33924 exists previously and hit the roadblock mentioned above: that modifying how the classes are sanitized could cause backward compatibility breakages.

A similar suggestion was made at the time, using a second parameter to toggle between permissive and strict escaping.

I'm going to close this as a duplicate so discussion can continue on #33924 and focus on a single ticket. I'm sorry it's taken some time for this to happen.

Note: See TracTickets for help on using tickets.