Make WordPress Core

Opened 11 years ago

Closed 9 years ago

Last modified 9 years ago

#28303 closed task (blessed) (fixed)

How to handle default/formal translations for automatic upgrades?

Reported by: zodiac1978's profile zodiac1978 Owned by: ocean90's profile ocean90
Milestone: 4.3 Priority: normal
Severity: normal Version:
Component: I18N Keywords: has-patch meta needs-testing
Focuses: Cc:

Description

If I want to have the formal translation of German for my site I have to overwrite the language files with the new formal po/mo files.

But on the next upgrade I think they will be overwritten again by the default version (not formal).

Andrew Nacin has mentioned the problem here: https://core.trac.wordpress.org/ticket/15677#comment:6

How about adding a new constant for wp-config.php to enable the possibility to configure this variant of the language?

I think this could need further changes in GlotPress or WordPress.org to make automatic upgrades for formal language variants possible.

Attachments (5)

28303.patch (4.9 KB) - added by ocean90 10 years ago.
28303.2.patch (5.8 KB) - added by ocean90 9 years ago.
Use wp_get_language_code_of_locale in remove_accents()
28303.3.patch (6.8 KB) - added by ocean90 9 years ago.
28303.4.patch (2.9 KB) - added by ocean90 9 years ago.
28303.5.patch (3.2 KB) - added by ocean90 9 years ago.

Download all attachments as: .zip

Change History (43)

#2 @zodiac1978
10 years ago

Workaround until this is fixed (hopefully in 4.1) is to turn out the translation updates completely because with 4.0 the files will be overwritten every 10 minutes per cron job.
http://codex.wordpress.org/Configuring_Automatic_Background_Updates#Translation_Updates_via_Filter

Update: Here is a plugin for that:
https://gist.github.com/2ndkauboy/1907f5847b4e092a88ac

Last edited 10 years ago by zodiac1978 (previous) (diff)

This ticket was mentioned in Slack in #polyglots by zodiac1978. View the logs.


10 years ago

#4 @johnbillion
10 years ago

#32050 was marked as a duplicate.

This ticket was mentioned in Slack in #polyglots by zodiac1978. View the logs.


10 years ago

This ticket was mentioned in Slack in #core by ocean90. View the logs.


10 years ago

This ticket was mentioned in Slack in #polyglots by zodiac1978. View the logs.


10 years ago

This ticket was mentioned in Slack in #core-i18n by ocean90. View the logs.


10 years ago

This ticket was mentioned in Slack in #polyglots by ocean90. View the logs.


10 years ago

This ticket was mentioned in Slack in #core-i18n by ocean90. View the logs.


10 years ago

#11 @ocean90
10 years ago

  • Keywords needs-patch added
  • Milestone changed from Awaiting Review to 4.3

I'm inclined to add core support for this in 4.3. It requires some changes on the API side too, but these can be done after a release too, if necessary.

The current idea is to support locales like abc_DE_variant or abc_variant which can be captured via /(?:(.+)-)?([a-z]{2,3}(?:_[A-Z]{2})?(?:_[a-z]+)?).po/

@ocean90
10 years ago

#12 @ocean90
9 years ago

  • Keywords has-patch added; needs-patch removed

28303.patch changes the regexp in wp_get_installed_translations() and introduces a new function wp_get_language_code_of_locale() to return the language code of a locale.

#13 @obenland
9 years ago

Beta is only two weeks out, let's keep this one moving.

@ocean90
9 years ago

Use wp_get_language_code_of_locale in remove_accents()

#14 @obenland
9 years ago

Patch looks reasonable. I found another instance pf str_replace( '_', '-', get_locale() ) in /wp-admin/admin-header.php, should that be replaced too? Not sure if there are more.

@ocean90
9 years ago

@ocean90
9 years ago

#15 @ocean90
9 years ago

In 28303.3.patch I have renamed the function to wp_get_html_language_code() and removed all the sanitize_html_class() calls because it's already handled by wp_get_html_language_code(). Reverts also the change to remove_accents(). Includes tests for wp_get_html_language_code().

28303.4.patch is an approach without wp_get_html_language_code(). The code has been moved to get_bloginfo() directly since this is the only place where the locale is used as a language code for the lang attribute. Includes tests for get_bloginfo( 'language' ).

@ocean90
9 years ago

#16 @ocean90
9 years ago

28303.5.patch removes the language code extraction completely. The reason for this is that for example pt-PT-ao1990 is a valid value for the lang attribute. It looks like formal and informal aren't valid sub tags yet, so we could make an exception for these locales. But since the lookup process includes a fallback pattern (BCP47, section 4.3) we actually don't need this.

Language tags are defined in BCP47. All valid sub tags are listed on this page or as a web app here.

#17 follow-up: @simonwheatley
9 years ago

Is it possible to use a more compact format for the variant suffix? The _formal suffix on de_DE_formal more than doubles the length of the string, which can cause issues in some places, e.g. Babble where we suffix the entire locale to the post type name; see Babble issue 247.

#18 in reply to: ↑ 17 @ocean90
9 years ago

Replying to simonwheatley:

Is it possible to use a more compact format for the variant suffix?

There are official sub tags which are even longer than seven chars, for example 1694acad in fr-FR-1694acad. BCP47 includes a section about Length Considerations which says, that there is no upper limit for language tags but "buffer sizes for language tags MUST allow for language tags of at least 35 characters."
So this sounds more like an implementation issues which needs to be fixed in the plugin, not something we have to consider if we want to support (official) variants of a languages.

#19 @ocean90
9 years ago

In 33027:

l10n: Update wp_get_installed_translations() to support variants of a language.

  • A variant of a language has its own locale, for example the locale of the formal variant of German is de_DE_formal.
  • Update remove_accents() and some CSS rules to support de_DE_formal.
  • Add tests for get_bloginfo( 'language' ).
  • API changes will be deployed over the next few days.

see #28303.

#20 @ocean90
9 years ago

  • Keywords meta added
  • Type changed from enhancement to task (blessed)

Changing to task because of the pending API changes.

#21 @obenland
9 years ago

  • Owner set to ocean90
  • Status changed from new to assigned

#22 @benjaminpick
9 years ago

Do I need to change something in my wp-config/Lang settings to declare that I'm using the formal variant? This should be clarified in the release notes.

Many thanks for working on this!

#23 follow-up: @obenland
9 years ago

@ocean90, how are the API changes progressing?

#24 in reply to: ↑ 23 @ocean90
9 years ago

  • Keywords needs-testing added

Replying to obenland:

@ocean90, how are the API changes progressing?

Good, API changes are deployed a few minutes ago.

Related: #meta1121

#25 @ocean90
9 years ago

  • Resolution set to fixed
  • Status changed from assigned to closed

Mission accomplished!

#26 @markoheijnen
9 years ago

I'm wondering if the current solution is the right one. We currently hacking the locale to support formal/informal. Informal/Formal is a writing style rather then a dialect. Personally I rather see it as a separate field so that get_locale() is still the same.

You can already see the problem with the commit made. CSS had to be changed and plugins like Babble need to do more work. For user content and plugins they care mostly about that it's German so de_DE. The only reason I see we need this is that we get the right language package. Then having the style as a separate option could be a better solution. Then there is no change of breakage.

Reason about this is that I would like to split the locale from the style. Knowing what the right GlotPress project will be.

#27 @ocean90
9 years ago

#33244 was marked as a duplicate.

This ticket was mentioned in Slack in #core by markoheijnen. View the logs.


9 years ago

#29 @markoheijnen
9 years ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

#30 follow-up: @ocean90
9 years ago

  • Resolution set to fixed
  • Status changed from reopened to closed

Discussion can happen even if the ticket is closed.

We're talking about variants of languages, which does not only include dialects. I also strongly disagree that formal is just a writing styling. I'm also confused by your statement that this is "hacking". Have you read the BCP47 which I had linked in the comments?

#31 follow-up: @markoheijnen
9 years ago

Thats what I tried but reopened the ticket since discussing this after 4.3 is landed is pointless. Which is soon.

Currently the implementation lacks detection if de_DE_formal is de_DE. That is an issue and gives the rights to reopen this ticket to rethink our current solution. It does break backwards compatibility even though it's purely styling. When writing content you will not write it in informal and formal but maybe that is different in Germany. So for a plugin like Babble de_DE would be enough in this case.

My opinion is that formal German is a writing style purely because it's still the same German as de_DE but more formal. Because of that I believe it is hacking locales. BCP47 is about the length of a the locale.

This ticket was mentioned in Slack in #core by markoheijnen. View the logs.


9 years ago

#33 in reply to: ↑ 30 @markoheijnen
9 years ago

Replying to ocean90:

Discussion can happen even if the ticket is closed.

We're talking about variants of languages, which does not only include dialects. I also strongly disagree that formal is just a writing styling. I'm also confused by your statement that this is "hacking". Have you read the BCP47 which I had linked in the comments?

Seems there is no way to get this discussed. So all the breakage that can happen is something users and developers need to care of?

#34 in reply to: ↑ 31 ; follow-up: @ocean90
9 years ago

Replying to markoheijnen:

Currently the implementation lacks detection if de_DE_formal is de_DE.

Can you explain this? Why should de_DE_formal be de_DE?

It does break backwards compatibility even though it's purely styling. When writing content you will not write it in informal and formal but maybe that is different in Germany.

What do you mean by "styling"? I'm managing a lot of business sites where WordPress *and* the content is formal. But not sure why this should be specific to German.

So for a plugin like Babble de_DE would be enough in this case.

Maybe, but this needs to be handled by Babble. Babble doesn't need to support each language if the developers think that some can be ignored.

My opinion is that formal German is a writing style purely because it's still the same German as de_DE but more formal.

It's still a variant and that's what this ticket is about.

Because of that I believe it is hacking locales. BCP47 is about the length of a the locale.

BCP 47 is about Tags for Identifying Languages and is referenced by the W3C for the lang attribute, see http://www.w3.org/TR/html5/dom.html#attr-lang.

This ticket isn't primary about formal vs informal. It's about supporting ~70 variant subtags.

#35 in reply to: ↑ 34 ; follow-up: @markoheijnen
9 years ago

Replying to ocean90:

Replying to markoheijnen:

Currently the implementation lacks detection if de_DE_formal is de_DE.

Can you explain this? Why should de_DE_formal be de_DE?

Because that is the language. And then _formal says something how it is written. Why not storing it as two values?

It does break backwards compatibility even though it's purely styling. When writing content you will not write it in informal and formal but maybe that is different in Germany.

What do you mean by "styling"? I'm managing a lot of business sites where WordPress *and* the content is formal. But not sure why this should be specific to German.

With styling I mean CSS as what you changed in [33027]. I have seen plugins doing things like that too.

So for a plugin like Babble de_DE would be enough in this case.

Maybe, but this needs to be handled by Babble. Babble doesn't need to support each language if the developers think that some can be ignored.

It's not about ignoring a language but that it's the "same". Someone will only write a post in de_DE and that can still be formal.

My opinion is that formal German is a writing style purely because it's still the same German as de_DE but more formal.

It's still a variant and that's what this ticket is about.

Because of that I believe it is hacking locales. BCP47 is about the length of a the locale.

BCP 47 is about Tags for Identifying Languages and is referenced by the W3C for the lang attribute, see http://www.w3.org/TR/html5/dom.html#attr-lang.

This ticket isn't primary about formal vs informal. It's about supporting ~70 variant subtags.

I get the possibilities of this ticket but still this ticket is about formal vs informal since that is what the title/description is saying.

#36 in reply to: ↑ 35 @Kau-Boy
9 years ago

There are other OSS doing it exactly the same way. I used Limesurvey for many surveys and when creating a new survey, you are asked for the language. There are not only formal ones for German.

Adding questions to a survey is always based on the "language". So switching from formal to informal German is not possible, because they are separate languages in Limesurvey.

BTW: They use GlotPress for their translations. Check out all the other formal ones here: https://translate.limesurvey.org/projects/limesurvey2

Version 0, edited 9 years ago by Kau-Boy (next)

#37 @anonymized_13423376
9 years ago

Regarding the filenaming scheme for variant cases like the german formal @zodiac1978 asked me to post a link and a few lines regarding that matter. Bottom line to use e.g. de_DE@… instead of de_DE_formal . Underneath are a link and a few liner notes by vaclav from poedit who brought the whole file name issue to my attention

https://www.gnu.org/software/gettext/manual/html_node/Header-Entry.html

"""
Language

Fill in the language code of the language. This can be in one of three forms:

  • - ‘ll’, an ISO 639 two-letter language code (lowercase). See Language Codes for the list of codes.
  • - ‘ll_CC’, where ‘ll’ is an ISO 639 two-letter language code (lowercase) and ‘CC’ is an ISO 3166 two-letter country code (uppercase). The country code specification is not redundant: Some languages have dialects in different countries. For example, ‘de_AT’ is used for Austria, and ‘pt_BR’ for Brazil. The country code serves to distinguish the dialects. See Language Codes and Country Codes for the lists of codes.
  • - ‘ll_CC@variant’, where ‘ll’ is an ISO 639 two-letter language code (lowercase), ‘CC’ is an ISO 3166 two-letter country code (uppercase), and ‘variant’ is a variant designator. The variant designator (lowercase) can be a script designator, such as ‘latin’ or ‘cyrillic’.

The naming convention ‘ll_CC’ is also the way locales are named on systems based on GNU libc. But there are three important differences:

  • In this PO file field, but not in locale names, ‘ll_CC’ combinations denoting a language’s main dialect are abbreviated as ‘ll’. For example, ‘de’ is equivalent to ‘de_DE’ (German as spoken in Germany), and ‘pt’ to ‘pt_PT’ (Portuguese as spoken in Portugal) in this context.
  • In this PO file field, suffixes like ‘.encoding’ are not used.
  • In this PO file field, variant designators that are not relevant to message translation, such as ‘@euro’, are not used.

So, if your locale name is ‘de_DE.UTF-8’, the language specification in PO files is just ‘de’.
"""

The filenames used by convention use the same designation. WordPress has its own conventions in places, but generally follows this, see the plugin files' naming in the form of "slug-locale.mo".

The more common convention in the gettext world is to use $prefix/share/locale/$lang/LC_MESSAGES directory and the @ convention is used there too. See some of the gettext translations installed on my system:

/usr/local/share/locale/cs/LC_MESSAGES/glib20.mo
/usr/local/share/locale/cs/LC_MESSAGES/gnupg.mo

/usr/local/share/locale/sr/LC_MESSAGES/glib20.mo
/usr/local/share/locale/sr/LC_MESSAGES/sed.mo
/usr/local/share/locale/sr@latin/LC_MESSAGES/glib20.mo

/usr/local/share/locale/ca@valencia/LC_MESSAGES/glib20.mo

You can find the same on pretty much any modern Unix (Linux or *BSD) in their respective /usr/share/locale folders too.

This ticket was mentioned in Slack in #core-i18n by simonwheatley. View the logs.


9 years ago

Note: See TracTickets for help on using tickets.