WordPress.org

Make WordPress Core

Opened 8 months ago

Last modified 6 months ago

#42725 new enhancement

Allow gender specific translations

Reported by: yoavf Owned by:
Milestone: Awaiting Review Priority: normal
Severity: normal Version: trunk
Component: I18N Keywords: has-patch dev-feedback
Focuses: Cc:

Description (last modified by yoavf)

For years, the WordPress translators community has needed to resort to painful compromises for languages with grammatical gender, where women are often discriminated by default.

From strings like Lead Developer to simply Author or Editor, some languages will always refer to these as males, regardless of the user self-identification.

While modern English grammar is exceptionally capable of being gender neutral, many other languages do not share this trait. Forcing all languages to adopt a gender-neutral grammar, even when they're not capable of it, diminishes the appeal of WordPress to non-English speaking users, especially women - because in almost all languages, pseudo gender-neutral grammar just uses the male form.

This ticket is a tracking ticket for the various tasks needed to allow for gender-specific translations

How gender specific translations will work with gettext

  • We will modify some of the existing translations functions (in a backward compatible way) to accept an optional user gender value.
  • When this happens, the POT generation tools will create 3 different strings, differentiated by a specific context.
  • On output, the correct translation will be loaded based on the value of the gender property

What needs to be done

  • Introduce a user profile field to store users' gender and a get_user_gender() function. See #42900
  • Add unit tests to current translation functions
  • Add an optional options parameter to __(), _x(), _n(), _nx() that will be used to pass the gender to the translation functions
  • Update documentation
  • Update GlotPress to group translations.

Notes:

  • This ticket originally included proof-of-concept patches. It has since been rewritten to reference other tickets to tackle the various tasks.
  • The details of the implementation were discussed during contributor day at WCUS 2017. Big thanks to @gregross, @johnbillion, @nullbyte for making this happen, and thanks to @nacin for his input.
  • Major props to @glueckpress for being a driving force in creating this with his WC Europe 2017 talk.

Attachments (3)

gender-to-user-profile.diff (5.0 KB) - added by yoavf 8 months ago.
User profile gender field, and get_current_user_gender()
add-new-gender-translation-function.diff (2.1 KB) - added by yoavf 8 months ago.
_g() and makepot support
42725.diff (2.1 KB) - added by danieltj 7 months ago.
Alternate implementation for the _g function

Download all attachments as: .zip

Change History (25)

#1 @yoavf
8 months ago

A relevant previous discussion specific to the credits page: #18003. My argument is that it's not about the specific contributors' preferences, but about translation and language needs. Having the wrong gender in a translated title representing a contributor is grammatically wrong, and non-inclusive.

#2 @yoavf
8 months ago

  • Description modified (diff)

#3 @yoavf
8 months ago

  • Description modified (diff)

#4 follow-up: @GregRoss
8 months ago

I'd suggest a different approach to the implementation, instead of using the gettext plural field, use gender specific gettext files, one per gender.

For example gender-female-fr_FR.po, this has several advantages:

  • No need to alter translation applications, use GlotPress, POEdit or any other standard gettext tool as there is no need to use the plural forms to store the gender data.
  • Unlimited types of gender. At best, using the plural fields in gettext will limit you to 6 types of gender, using a single file per gender allows you to support any number of genders you want.
  • Load only the gender data you need. Having separate files allows WP to load only those gender translations that are needed instead of loading all of them.
  • Simplification of the _g() funciton, as it basically becomes a pass through to __() with a slightly modified $domain string (perhaps 'gender-' . $gender . '-' . $domain).
  • Support for plurals, since the plural fields are no longer being used for gender information, they can instead be used for actual plurals, making _gn() a pass through to _n() much like above.
  • In fact pretty much all the translation functions can be mirrored in this way so the gender translation API would be on par with the language translation API.
  • Support for non-w.org plugins, as they will need to be able to create and load their own gender po files, using standard tools makes this easy for them.
  • No need for special context logic as standard context comments would work.

The above solution seems like a far more scaleable and maintainable solution in the long run and needs no changes to the standard gettext files or locales

However, a few thoughts on the original proposal as well:

For simplicity purposes, g() assumes English will remain gender neutral, and so only takes one string as input. I'm happy to reconsider and let it have an optional three string input.

That seems like making English a second class citizen, gender support should be for everyone.

Looking at the code a bit, adding _g() to the extraction code probably doesn't work as I think you expect it to, as when the extract runs over a file it would simply include these strings in to the default language extract, which isn't what you want to happen. You want a separate file created with just the _g() strings in it, which would require a separate script or different logic in the extractor.

This ticket was mentioned in Slack in #polyglots by glueckpress. View the logs.


8 months ago

@yoavf
8 months ago

User profile gender field, and get_current_user_gender()

@yoavf
8 months ago

_g() and makepot support

#6 @yoavf
8 months ago

  • Description modified (diff)

#7 in reply to: ↑ 4 @yoavf
8 months ago

Replying to GregRoss:

Looking at the code a bit, adding _g() to the extraction code probably doesn't work as I think you expect it to, as when the extract runs over a file it would simply include these strings in the default language extract, which isn't what you want to happen.

On the contrary - this is exactly what I want to happen. I've updated the patches above to fix some minor issues and added a section titled "How _g() works" to this ticket description. I think this should clear up any confusion on why I implemented things this way, and make it clear why the multiple PO files approach is not relevant :)

#8 @GregRoss
8 months ago

I think this should clear up any confusion on why I implemented things this way, and make it clear why the multiple PO files approach is not relevant :)

While that does clear up a few things, it doesn't really explain why the multiple PO system isn't relevant. The cons of the hack seem to be significant with few pros to recommend it.

Where as the significant advantages of the multiple PO solution seem to be a better long term solution with the only real con being some additional work associated with translate.w.org (creating new gender projects for each plugin/theme/core/etc.) and reworking the extractor a bit.

#9 @glueckpress
8 months ago

@GregRoss While I’m not qualified to comment on technical aspects, I’d like to weigh in on this statement:

the only real con being some additional work associated with translate.w.org

Fwiw, that’s an over-simplification. We already have a precedence for what it means when language files get multiplied. Ask translation teams who manage default and formal variants of their languages.

German makes for a proper example at a higher scale here:

  • There’s de_DE.po and de_DE_formal.po.
  • And then there’s de_CH and de_CH_formal (Swiss German) which is ~90% identical with de_DE, but still needs to be translated and reviewed separately.
  • So the German Polyglots team needs to keep 4 German language files in sync.
  • Consistent translation across variants is an expectation not only for core, but for every theme and plugin on the repos.
  • This basically leads to a workflow where a person would translate 1 string and copy it over to 1-3 other open browser tabs.
  • When the string gets reviewed by a Translation Editor and changes need to be made, the same 1-3 other locales have to be edited accordingly.

While German might be an edge case scenario, it helps understanding the non-technical (i.e. human-related) implications of an approach that would multiply language files:

Keeping multiple translation files harmonised not only means an exponential increase in work load for experienced translators, it also puts a higher threshold in the way of new volunteers.

Probably every Translation Editor on the Polyglots team can tell you about the challenges when it comes to onboarding new translators. Making these challenges even bigger by doubling-up their work load would likely do a bad service to the adoption of the WordPress platform globally.

While every approach will increase workload for translators, because certain strings need more than 1 translation, Yoav’s proposal seems far more scalable for Polyglots teams.

Needless to say, I applaud this initiative of moving a chronically neglected topic forward! Getting a minimum viable solution into core is key, in my opinion, and I’m incredibly excited to see this being taken on and being discussed by the i18n developer community.

#10 @GregRoss
8 months ago

@glueckpress This change will impact the translation teams one way or the other, both will add additional strings and work.

Unlike variants (like your German example), I would not expect these gender strings to have a high percentage of overlap between the genders (after all that's the whole point of having the different genders).

That means that whether we embed them in the current translation files or in separate PO files, the same work is going to have to go in to managing them.

P.S. The issue you brought up in the German variants example is well know, you might want to take a look at this PR for GlotPress, https://github.com/GlotPress/GlotPress-WP/pull/747, which is targeted for the next release and allows for a root->variant relationship between locales and for any untranslated strings in the variant, the root translation is used.

This ticket was mentioned in Slack in #polyglots by bastienho. View the logs.


8 months ago

#12 @yoavf
8 months ago

  • Description modified (diff)

#13 follow-up: @grapplerulrich
7 months ago

Thank you @yoavf for creating the ticket. I think it is a good idea.

One small thing that I saw when looking at the code is that if ( ! function_exists( 'wp_get_current_user' ) ) should not be needed in get_current_user_gender() unless the function is not being loaded in certain senarios. In that case it would be best to document that.

I am not trying to tear down your work or increase the scope of the change but make sure we think of the different angles.

The changes suggested here would not yet fix Lead Developer as the text is provided by an API.

It would be good to get a list of strings that would need to be changed so that we can see what other places need changes too. One place comes to mind is the REST API. Do we need to pass a value to define the gender of the person?

For simplicity purposes, g() assumes English will remain gender neutral, and so only takes one string as input. I'm happy to reconsider and let it have an optional three string input.

It depends on what type of text strings we use. What confused me is the explanation of the dropdown: "Female: She published a new post.". A plugin or theme developer may want to use he or she if the user can choose from the drop-down. If we keep English gender neutral the setting will do nothing in English.

I found this PDF which covers the gender pronoun. https://www.ccsu.edu/lgbt/files/PreferredGenderPronounsForFaculty.pdf

What we do need to think about is how people who are not informed about gender identification understand the option too.

#14 in reply to: ↑ 13 @yoavf
7 months ago

Replying to grapplerulrich:

I am not trying to tear down your work or increase the scope of the change but make sure we think of the different angles.

No worries - all input is appreciated. I'm going to rewrite this ticket into a tracking ticket. During WC US Contributor day, a group of us was able to form up a solid plan going forward, and we'll split the work into a few relevant more self-contained tickets.

The changes suggested here would not yet fix Lead Developer as the text is provided by an API.

That's correct - we'll eventually need to update the API to provide the gender.

#15 @yoavf
7 months ago

  • Description modified (diff)
  • Summary changed from Introduce gender compatible translation function, and gender user profile field to Allow gender specific translations

#16 @yoavf
7 months ago

  • Description modified (diff)

@danieltj
7 months ago

Alternate implementation for the _g function

#17 @danieltj
7 months ago

  • Keywords dev-feedback added

I've added attachment 42725.diff which is an alternative approach to the _g() function which allows people to specify the gender based words that can be used and swaps out the words based on the user's preference.

It'll probably need some changes, it's not perfect but I think it works as a good starting point. It allows more flexibility for people to specific the actual word that needs swapping out.

#18 @GregRoss
7 months ago

@danieltj there was quite a bit of discussion at WCUS on contributors day around this and the original plan to use _g() was changed (see the updated description of this ticket) and instead we'll be extending the existing translation functions.

#19 @swissspidy
6 months ago

#36617 was marked as a duplicate.

This ticket was mentioned in Slack in #core-editor by swissspidy. View the logs.


6 months ago

This ticket was mentioned in Slack in #core-js by aduth. View the logs.


6 months ago

#22 @vslavik
6 months ago

Poedit developers here. Allow me to provide what is somewhat outsider perspective — that of a mainstream gettext user. I know makepot.php is used internally in WP, but there is sizeable community of WordPress users that extract strings using GNU gettext's xgettext instead (either via Poedit or from CLI) and I think it's worth nothing how this proposal affects them:

In short, the standard gettext approach would be as mentioned in comment:8 — that is fully compatible with the rest of the gettext world and is in fact using it as gettext is meant to be used. In comment:9 this is criticized as causing translation files explosion, but that doesn't have to be. Part of gettext (which, I'm aware, WordPress doesn't currently implement, and even chooses to use non-standard _formal suffixes instead) is cascading lookup of translations. Suppose the language is set to de_DE@formal: then the runtime is supposed to look for the translation in multiple files, in order, until the first hit: first de_DE@formal.mo, then de_DE.mo (for where formality doesn't matter to the string) and lastly de.mo (for where the country doesn't matter) files (I'm simplifying the actual names WP looks for for the sake of illustration here). So in a typical case, there could be a complementary de@male.mo and de@female.mo pair with just the few differing strings.

From the perspective of xgettext user, the old _g() proposal is worse than the above ideal (which I suspect is unrealistic to achieve in the WP context) because no standard tool can parse it. But it's not terribly hard to extract with a bit of custom code.

What I understand to be the current proposal, with an optional options argument (presumably a dictionary of some sort?) is the hardest thing to handle with an external parser. Most extraction parsers, such as xgettext, don't parse the full (PHP) language, but only look for "interesting" fragments — sub-grammars if you will. Something as syntactically complicated as a free-form(-ish) options argument is going to be a PITA for non-makepot.php tools.

Finally, regarding comment:7 about not wanting to make English second-class:

For simplicity purposes, g() assumes English will remain gender neutral, and so only takes one string as input. I'm happy to reconsider and let it have an optional three string input.

That seems like making English a second class citizen, gender support should be for everyone.

Let me just point out that all the proposals above already fully provide gender support for English with no need to change the gettext functions API to have more arguments (which would again make it very hard for tools to extract). You can have — and people often do, for non-translation purposes -- an English translation file too. So it's entirely possible to have gender-neutral strings in the source code, as is done now, while still providing gender-specific strings for English inside en(_US).mo.

Note: See TracTickets for help on using tickets.