WordPress.org

Make WordPress Core

Opened 5 months ago

Last modified 5 months ago

#43147 new enhancement

Introduce `esc_html_comment` and translation related functions

Reported by: jipmoors Owned by:
Milestone: Awaiting Review Priority: normal
Severity: normal Version: trunk
Component: Formatting Keywords: has-patch has-unit-tests 2nd-opinion
Focuses: Cc:

Description

Triggered by the following PR on Yoast SEO #8687 I saw the need for esc_html_comment functionality in WordPress.

This patch is based on the current API for formatting and escaping functions. The implementation is based on the specifications found on the HTML specifications page.

I've added unit tests to cover the situations that I could come up with.

Attachments (3)

Introduce_esc_html_comment_functions.patch (9.0 KB) - added by jipmoors 5 months ago.
Introduce_esc_html_comment_functions-2.patch (9.9 KB) - added by jipmoors 5 months ago.
Fixed spec implementation to disallow double dashes anywhere
Introduce_esc_html_comment_functions-3.patch (10.3 KB) - added by jipmoors 5 months ago.
Add an extra loop to make sure hidden illegal characters are removed

Download all attachments as: .zip

Change History (11)

#1 @jipmoors
5 months ago

  • Keywords has-patch has-unit-tests added

#2 @jdgrimes
5 months ago

The spec says that a comment cannot "contain two consecutive U+002D HYPHEN-MINUS characters (--)". However, the patch seems to only disallow -- at the beginning of the comment text, not anywhere within it.

@jipmoors
5 months ago

Fixed spec implementation to disallow double dashes anywhere

#3 @jipmoors
5 months ago

@jdgrimes thanks for noticing, have created an updated patch with proper implementation.

#4 @dd32
5 months ago

  • Keywords 2nd-opinion added

I don't personally think this is needed myself.

The whole premise of the need for it is not being able to trust the data you're outputting, which if it's not hard coded, sounds like it should be being sanitised rather than escaped. WordPress also trusts translations inherently, meaning you should fix a translation that contains rogue HTML rather than simply attempting to escape the bad translations.

#5 follow-up: @schlessera
5 months ago

@dd32 I personally never trust the translations, because they are outside of a developer's control.

When you say "WordPress also trusts translations inherently", do you have something definitive you can point me to? I tried to read up on it, but all I found was (mostly implicit) recommendations to escape anything that will be rendered into HTML.

Also, "fix a translation that contains rogue HTML" is not technically possible in a lot of cases, as translations can just be uploaded to sites as additional (potentially rogue) files.

I personally see the value in this, especially since we decided to use HTML comments as our content data structure.

Last edited 5 months ago by schlessera (previous) (diff)

@jipmoors
5 months ago

Add an extra loop to make sure hidden illegal characters are removed

#6 @jipmoors
5 months ago

There is one more argument to consider, we can escape all of the ways to communicate with data: json, SQL, HTML, JavaScript. HTML comments are part of this domain, if not escaped properly they will cause side-effects and unwanted behaviour.

#7 in reply to: ↑ 5 @dd32
5 months ago

Replying to schlessera:

@dd32 I personally never trust the translations, because they are outside of a developer's control.

While that's your preference, WordPress itself trusts it's translations from translate.wordpress.org, and suggest that if you're running other translations you need to trust the source of those strings. If you're unable to trust them, you've got potentially larger issues than random extra html tags in said strings, and you should probably review them before shipping them out.

When you say "WordPress also trusts translations inherently", do you have something definitive you can point me to? I tried to read up on it, but all I found was (mostly implicit) recommendations to escape anything that will be rendered into HTML.

https://core.trac.wordpress.org/ticket/30724 is probably the best reference I have right now. Unfortunately most references, such as the codex, have been updated to suggest escaping everything or running it through kses, even though it's not recommended or suggested as required by core. Unfortunately a certain PHPCS ruleset suggested escaping everything at some point and went against cores implicit "we trust translation strings".

#8 @schlessera
5 months ago

@dd32 Thanks for the link to the ticket. That discussion should be transcribed into the handbook somewhere, I think.

I can understand this stance for Core, but I don't think this would apply for plugins or themes. If I'm not mistaken, one plugin can easily install language files that provide translations for other plugins/themes. These are just random additional files that no-one will care about.

Note: See TracTickets for help on using tickets.