Make WordPress Core

Opened 9 years ago

Last modified 3 months ago

#33073 new enhancement

Some strings need "no HTML entities" translator comments

Reported by: xibe's profile xibe Owned by:
Milestone: 6.6 Priority: normal
Severity: normal Version:
Component: I18N Keywords: good-first-bug has-patch needs-refresh
Focuses: Cc:

Description

This is a renewal of #10005, which was apparently my first patch to WP 6 years ago, but never got accepted. #10005 was closed as fixed, but it truth it was not (see my last comment there).

So, I'm opening a new ticket about this... and updating the description and patch from 6 years :)

Here goes!

Several strings need to have an indication warning translators against the inclusion of HTML entities within their translation, because of where the strings are user (RSS feeds, e-mails, JavaScript...).

Case in point: in French, there should be a non-breaking space ( ) before any double sign (; : ! ?). So, translators would turn "URL: %s" into "Adresse web : %s". But since it is used in e-mails, it is displayed as-is. And for strings that are used in feeds, it breaks them...

Suggestions comment: "Do not add HTML entities ( , etc): used in [context]".

Patch fixes all strings that I could find.

I do hope it is a simple enough patch, and not too close to RC, that it could make it into 4.3.

Thanks!

Attachments (1)

33073.diff (18.4 KB) - added by xibe 9 years ago.
Patch for upgrade.php, feed-atom-comments.php, feed-rss2-comments.php, pluggable.php

Download all attachments as: .zip

Change History (35)

@xibe
9 years ago

Patch for upgrade.php, feed-atom-comments.php, feed-rss2-comments.php, pluggable.php

#1 @xibe
9 years ago

  • Keywords has-patch added

#2 @atomicjack
9 years ago

  • Keywords needs-testing added

This ticket was mentioned in Slack in #polyglots by xavier. View the logs.


9 years ago

#4 @fxbenard
9 years ago

+1 Making it for 4.3 will be a nice treat for all French translators (new comers - old elephants)

#5 follow-up: @xibe
9 years ago

Note: for some strings, I chose to expand the code block.

For instance:

if ( EMPTY_TRASH_DAYS )
	$notify_message .= sprintf( __('Trash it: %s'), admin_url("comment.php?action=trash&c=$comment_id") ) . "\r\n";
else
	$notify_message .= sprintf( __('Delete it: %s'), admin_url("comment.php?action=delete&c=$comment_id") ) . "\r\n";

...is turned into:

if ( EMPTY_TRASH_DAYS ) {
	/* translators: Do not use HTML entities ( , etc.): this string is used in e-mails. */
	$notify_message .= sprintf( __('Trash it: %s'), admin_url("comment.php?action=trash&c=$comment_id") ) . "\r\n";
}
else {
	/* translators: Do not use HTML entities ( , etc.): this string is used in e-mails. */
	$notify_message .= sprintf( __('Delete it: %s'), admin_url("comment.php?action=delete&c=$comment_id") ) . "\r\n";
}

The line I add is a comment, so I'm unsure if it's necessary to add curly braces to mark the code block, but I want to play it safe.

#6 @SergeyBiryukov
9 years ago

#10005 was marked as a duplicate.

#7 in reply to: ↑ 5 ; follow-up: @SergeyBiryukov
9 years ago

Replying to xibe:

The line I add is a comment, so I'm unsure if it's necessary to add curly braces to mark the code block, but I want to play it safe.

Good call, this usage of braces was added to the coding standards some time ago.

#8 @zodiac1978
9 years ago

+1

This a often a question on the i18n-table at contributor days.

Could/Should we use entities or not? - Depends on the context.

This would be good solution to add translator comments!

#9 in reply to: ↑ 7 ; follow-up: @xibe
9 years ago

Good call, this usage of braces was added to the coding standards some time ago.

So I see! I'm guessing the PSR2 Fixer has not been run on the whole codebase (with the braces option)? But that's for another ticket...

Version 0, edited 9 years ago by xibe (next)

#10 follow-up: @xibe
9 years ago

A note by @nbachiyski from #10005:

We have computers on our side and we can do better than telling the translators not to use entities. In all cases we can properly escape entities if they aren't allowed. It can wait until next version, though.

Still, nothing happened in 6 years, so I'm submitting my patch idea. If anything better comes up, I'd be happy to forget it :)

#11 in reply to: ↑ 10 ; follow-up: @azaozz
9 years ago

Replying to xibe:

Would it work if we run html_entity_decode() when using these strings? This is the case for all strings that are outputted for use in JS and as far as I know there aren't any problems with entities there.

#12 in reply to: ↑ 9 @netweb
9 years ago

Replying to xibe:

So I see! I'm guessing the PSR2 Fixer has not been run on the whole codebase (with the braces option)? (I know we do not use PSR-2, but the fixer could be useful in this case). But that's for another ticket...

See https://make.wordpress.org/core/2011/03/23/code-refactoring/, but the gist of that doc is "Code refactoring should not be done just because we can."

#13 in reply to: ↑ 11 @xibe
9 years ago

  • Keywords has-patch needs-testing removed

Replying to azaozz:

Would it work if we run html_entity_decode() when using these strings? This is the case for all strings that are outputted for use in JS and as far as I know there aren't any problems with entities there.

Good call! Actually, I see that the feed-atom-comments.php and feed-rss2-comments.php strings are already enclosed in ent2ncr() -- and the feeds don't break with "By: %s" translated to "Par : %s", so it seems we're good here.

For the mail related strings, I guess html_entity_decode() is indeed the better way, since it converts the entities to their applicable characters, whereas ent2ncr() "only" switches named entities to their numbered counterparts.

If so, I can take care of the patch.

#14 @johnbillion
9 years ago

  • Keywords needs-patch added
  • Milestone changed from Awaiting Review to Future Release
  • Version 4.3 deleted

This ticket was mentioned in Slack in #core-i18n by xavier. View the logs.


8 years ago

#16 @Presskopp
2 years ago

We just had this issue/discussion in the german polyglots team, so we can see it's still a valid point to be patched. I'm not sure if we need comments to tell the translator to not use entities or if we'd need a software side patch to convert entities when they appear. However, I'd appreciate if this could go forward somewhere.

Last edited 2 years ago by Presskopp (previous) (diff)

#17 @La Geek
2 years ago

We all give the best to receive a professional output of our translations. This should be above all also in the interest of WordPress.org. But we cannot achieve this if we are not shown when an entity is not transformed.

Please display a warning and prevent saving the translation in such cases.

This is a renewal of #10005, which was apparently my first patch to WP 6 years ago

and now additional 7 further years in total 13 years and still not supported? Really?

#18 @galanos
2 years ago

Seconding this. As I’ve learned, HTML entities sometimes must be inserted into translated strings. Making translators check the context first—HTML, JavaScript, e-mail, tattoo template, or whatnot—inevitably leads to errors and is non-inclusive by further raising the threshold of technical knowledge required of translators. Thank you!

This ticket was mentioned in Slack in #core by presskopp. View the logs.


13 months ago

#20 @SergeyBiryukov
13 months ago

  • Milestone set to 6.3

#21 @oglekler
12 months ago

Hi @johnbillion,
can you have a look at the patch, it most likely needs a refresh, but you added needs-patch after patch was already there, so, I am wondering if you had in mind another solution instead of the one which was provided by patch.
Thank you 🙏

This ticket was mentioned in Slack in #core by oglekler. View the logs.


12 months ago

#23 @johnbillion
11 months ago

  • Keywords has-patch needs-refresh added; needs-patch removed

This ticket was mentioned in Slack in #core by mukeshpanchal27. View the logs.


11 months ago

#25 @audrasjb
11 months ago

  • Milestone changed from 6.3 to 6.4

Given this enhancement still needs refresh and as WP 6.3 beta 1 is going to be released in a couple days, let's move this to milestone 6.4.

#26 @oglekler
9 months ago

  • Keywords good-first-bug added

What needs to be done:

  1. Refreshing the patch because at least paths to files are changed.
  2. Checking that it covers all pleases when this note should be added. It is possible that we have more of them.

Because it looks like a straight forward task, I am marking it as good first bug.

This ticket was mentioned in Slack in #core by oglekler. View the logs.


9 months ago

#28 follow-up: @costdev
8 months ago

@oglekler As comment:18 mentions, asking translators to check the context may still lead to human error. comment:13 points to html_entity_decode() as a viable code-based solution. I believe needs-patch is correct because the previous patch's approach no longer seems to be the preferred one, and a new patch is needed.

#29 @oglekler
8 months ago

  • Keywords needs-patch added; has-patch needs-refresh removed

Thank you @costdev, good points 🙏

#30 @oglekler
8 months ago

  • Milestone changed from 6.4 to 6.5

We need a new patch and there is no time before Beta 1 to finish and test it, so, I am moving this ticket into the 6.5 milestone.

This ticket was mentioned in PR #5457 on WordPress/wordpress-develop by benjaminknox.


7 months ago
#31

  • Keywords has-patch added; needs-patch removed

When translations are being worked on, translators may use html entities in their translations. In scenarios where translation doesn't happen in a browser, these can show as the raw set of characters for the entity, for instance &bnsp;. We can fix this by using the PHP function html_entity_decode inline in these scenarios to decode html entities that have potentially been used in translation input.

Trac ticket: https://core.trac.wordpress.org/ticket/33073

#32 @benjaminknox
7 months ago

I put up this PR https://github.com/WordPress/wordpress-develop/pull/5457 based on the attached patch by @xibe and the suggestion to use html_decode_entity function.

Do you guys think we need tests written for this?

Last edited 7 months ago by benjaminknox (previous) (diff)

#33 @oglekler
5 months ago

@costdev is it safe to add HTML into strings, or do we need additionally to check for allowed tags? Can you please take a look.

#34 in reply to: ↑ 28 @swissspidy
3 months ago

  • Keywords needs-refresh added
  • Milestone changed from 6.5 to 6.6

Asking translators to check the context may still lead to human error. comment:13 points to html_entity_decode() as a viable code-based solution. I believe needs-patch is correct because the previous patch's approach no longer seems to be the preferred one, and a new patch is needed.

IMHO we need both:

  1. Translator comments so that translators do not use HTML entities (documentation)
  2. Usage of html_entity_decode to fix any remaining HTML entities (hardening)
Note: See TracTickets for help on using tickets.