Make WordPress Core

Opened 2 years ago

Closed 2 weeks ago

#56435 closed enhancement (worksforme)

Alleviate translation workload

Reported by: anrghg's profile anrghg Owned by:
Milestone: Priority: normal
Severity: normal Version:
Component: I18N Keywords: 2nd-opinion close
Focuses: Cc:

Description

Checking out https://translate.wordpress.org/projects/wp/dev/af/default/ I’m scared by the unnecessary workload and strain put on:

  1. Translators of WordPress Core;
  2. Translators of plugins.

I’d suggest taking urgent corrective action to address two issues, and to improve practice in order to prevent these issues:

  1. Gettext strings in WordPress Core are very messy and don’t follow all rules set out on https://developer.wordpress.org/plugins/internationalization/how-to-internationalize-your-plugin/#best-practices-for-writing-strings:
  • Avoid unusual markup and unusual control characters – do not include tags that surround your text
  • Do not put unnecessary HTML markup into the translated string

There are 150+ messages with HTML in them.

Examples:

#: wp-includes/js/dist/block-library.js:31347
msgid "Commenter avatars come from <a>Gravatar</a>"
msgstr ""

#. translators: %s: Comment author link.
#: wp-includes/js/dist/block-library.js:31319
msgid "%s <span>says:</span>"
msgstr ""

#. translators: %s: URL to media library.
#: wp-includes/widgets/class-wp-widget-media.php:501
msgid "That file cannot be found. Check your <a href=\"%s\">media library</a> and make sure it was not deleted."
msgstr ""

I’d prefer using two placeholders like so (uneven spacing is intentional):

#. translators: 1, 2: start and end link tags.
msgid "That file cannot be found. Check your %1$s media library%2$s and make sure it was not deleted."
  • Try to use the same words and same symbols so not multiple strings needs to be translated

Example:

#: wp-activate.php:183 wp-includes/post-template.php:1728
#: wp-admin/includes/meta-boxes.php:203
msgid "Password:"
msgstr "Wagwoord:"

#: wp-includes/general-template.php:518 wp-login.php:1407
#: wp-admin/includes/class-wp-posts-list-table.php:1712
#: wp-admin/includes/file.php:2351 wp-admin/install.php:137
#: wp-admin/install.php:427 wp-admin/options-writing.php:167
#: wp-admin/setup-config.php:225 wp-admin/user-new.php:564
msgid "Password"
msgstr "Wagwoord"
  1. I’d suggest to withdraw the recommendation “If there are strings in your plugin that are also used in WordPress core (e.g. ‘Settings’), you should still add your own text domain to them, otherwise they’ll become untranslated if the core string changes (which happens).” on https://developer.wordpress.org/plugins/internationalization/how-to-internationalize-your-plugin/#add-text-domain-to-strings.

Trying to streamline a plugin’s Gettext strings I’m looking into the first WordPress Core PO file, that I’m using as a reference for the Portable Object Message Catalog of WordPress Core.

In plugins, as many strings as possible should be synced with WordPress Core, to benefit from any existing translations. There is a caveat to this, @see link above. But those strings are very unlikely to change.

Change History (5)

#1 @swissspidy
2 years ago

  • Component changed from General to I18N
  • Severity changed from major to normal
  • Version trunk deleted

#2 follow-up: @johnbillion
2 years ago

  • Keywords 2nd-opinion added
  • Type changed from defect (bug) to enhancement

Regarding using two placeholders instead of opening and closing HTML tags:

  • What's the benefit of doing this? It appears to be no more or less easy for a translator to understand, it's just different. A translator still needs to understand the concept of an opening and closing HTML tag in order to produce a correct translation using the placeholders.
  • Ironically making this change will increase the workload for translators, as we'll get 150 new strings that need to be translated.
  • I wouldn't be surprised if there are some translations which adjust or remove the HTML to improve formatting for a given language, which would no longer be possible with these placeholders (without triggering a translation error on w.org).

Try to use the same words and same symbols so not multiple strings needs to be translated

This is good advice and improvements to reduce the number of similar strings has been ongoing for years, see for example https://core.trac.wordpress.org/query?component=I18N&summary=~similar .

It's worth noting that "Password:" and "Password" are not the same string and they serve different purposes. Removing the colon to reduce the number of translations would result in a less appropriate phrase being shown to users, and it's not possible to hardcode the colon outside of the translation as this needs to be localisable too.

If you've got some more examples of specific strings that are similar to others, please feel free to open individual tickets for them. Thanks!

#3 in reply to: ↑ 2 @anrghg
2 years ago

Replying to johnbillion:

Regarding using two placeholders instead of opening and closing HTML tags:

  • What's the benefit of doing this? It appears to be no more or less easy for a translator to understand, it's just different. A translator still needs to understand the concept of an opening and closing HTML tag in order to produce a correct translation using the placeholders.

I’m pleased to learn that it’s actually better. I was only taking WordPress’ advice literally and got stunned when seeing all this HTML markup amidst the translatable strings.

The benefit as I understood it was that translators are expected to handle placeholders but no HTML, and that developers are thus required to find ways to avoid any HTML withtin Gettext strings.

  • Ironically making this change will increase the workload for translators, as we'll get 150 new strings that need to be translated.

My pitch about “corrective action” was too hasty. Apologies. And consistency will require WordPress to keep making HTML part of the strings.

  • I wouldn't be surprised if there are some translations which adjust or remove the HTML to improve formatting for a given language, which would no longer be possible with these placeholders (without triggering a translation error on w.org).

WordPress’ advice seems to be based on the concern about translators breaking code; turns out translators need access to HTML? Without an example I can’t seem to figure out why, so I’m likely to stick with the original idea.

Why would translators need to remove links? HTML already handles locale-dependent tweaks, e.g. in Hebrew, by lack of italic, <em> is rendered as bold, not italic, because it’s emphasis, not <i>.

I don’t know how it works in Poedit, but in plain text, without HTML there is no need to copy-paste the ID string, the translation can be typed right away.

Try to use the same words and same symbols so not multiple strings needs to be translated

This is good advice and improvements to reduce the number of similar strings has been ongoing for years, see for example https://core.trac.wordpress.org/query?component=I18N&summary=~similar .

Like all bullet points it’s WordPress’ advice. I’ve made that mistake, too, and am now streamlining. I’m interested in a handy version of the Portable Object Message Catalog of WordPress Core. If all developers had such a list at hand, variations would be minimal.

It's worth noting that "Password:" and "Password" are not the same string and they serve different purposes. Removing the colon to reduce the number of translations would result in a less appropriate phrase being shown to users, and it's not possible to hardcode the colon outside of the translation as this needs to be localisable too.

When there is a colon, yes. In wp-includes/post-template.php:1728 it’s the label of a textbox below “please enter your password below:” with a colon; in wp-admin/includes/meta-boxes.php:203 it’s an input field label, too. The remaining instance, wp-activate.php:183 would need a table instead of two paragraphs below “Your account is now active!”; only then would the colon be avoidable.

If you've got some more examples of specific strings that are similar to others, please feel free to open individual tickets for them. Thanks!

Thank you. Streamlining existing strings at this point would require discarding part of the translations, and would alleviate the workload in all locales added for support from now on. Is there a tool comparing strings across IDs? I searched strings with a colon that are likely to occur without, too. When starting I ignored the tool https://stuff.mit.edu/afs/sipb/project/gtk/gtk_v2.0/doc/gettext/gettext_6.html and its merge_backup feature. So there seems to be a lack of a tool or feature checking Gettext strings against each other and warning when punctuation causes duplicates.

#4 @swissspidy
9 months ago

  • Keywords close added

#5 @swissspidy
2 weeks ago

  • Milestone Awaiting Review deleted
  • Resolution set to worksforme
  • Status changed from new to closed

If you've got some more examples of specific strings that are similar to others, please feel free to open individual tickets for them. Thanks!

+1. Individual, concrete tickets are best.

Thank you. Streamlining existing strings at this point would require discarding part of the translations, and would alleviate the workload in all locales added for support from now on.

FWIW if strings only change slightly, they're already marked as fuzzy and can be quickly fixed by translators.

Note: See TracTickets for help on using tickets.