WordPress.org

Make WordPress Core

Opened 4 years ago

Closed 4 years ago

#14972 closed enhancement (maybelater)

Proposal: Pool of common strings for core, themes, and plugins

Reported by: demetris Owned by:
Milestone: Priority: normal
Severity: normal Version: 3.1
Component: I18N Keywords: dev-feedback
Focuses: Cc:

Description

This idea originates in a little experiment I did during the 2.9 dev cycle. I wanted to see if it was possible to make a WordPress theme that would be not simply internationalized but also localized out of the box.

The method was simple:

If a string I needed existed in core, I left the second parameter of the gettext function empty, in order to use the default translation.

The result was better than I expected:

When I finished, I had a theme with a total of 100 strings, 70 of which were already localized out of the box!

I’ve thought a bit about the idea since and I believe it is worth pursuing because it will have significant benefits. However, before plugin/theme authors are willing to adopt the method, there are a couple of technical things we should do on our end. I’ll start with those, and then go on to enumerate the expected benefits.

  1. TECHNICAL PREREQUISITES

1.1. Central pool of common strings

Relying on core strings now is risky: Core strings can disappear or change at any time, leaving your plugin or theme partially localized.

A solution to this can be a file, say strings-common.php, that will contain core strings commonly needed by plugins or themes, and also strings not existing in core but commonly needed by plugins or themes. Strings in that file will not be removable, so that plugin/theme devs can use them safely.

Here is a small sample, to give you an idea of what kinds of strings I am talking about:

__('Next entries');
__('Next posts');
__('Permalink to %s');
__('Permanent link to %s');
__('Previous entries');
__('Previous posts');
__('Skip to content');
__('Skip to main content');
__('Valid CSS');
__('Valid CSS3');
__('Valid HTML');
__('Valid HTML5');
__('Valid XHTML');

1.2. POT-making script

As far as I know, there is no script now to make a plugin/theme POT that leaves out strings without text domain or with default as text domain. This would be essential for the idea to succeed. So, we would also need such a script.

  1. BENEFITS

2.1. Reduced workload for translators

This method of L10n/i18n will free up time for translators. Hopefully, part of that time will then go to more translations or to better translations, or to a bit of both.

2.2. Standardization and QA

For example, now all of the following are used:

  • Changes saved
  • Changes saved.
  • Changes saved!


A strings-common.php file could standardize on one (the second one seems like the best candidate here) and thus encourage its usage.

Since a pool of common strings would be exposed to greater scrutiny, standardization and QA of this kind would also benefit core strings.

2.3. Performance and optimization

If this idea is promoted and widely adopted, it would result in lower memory usage for localized WordPress setups (memory usage is more of an issue in localized setups than it is in unlocalized setups).

It would also mean smaller packages for internationalized and localized themes and plugins.

  1. NOTES, CLARIFICATIONS

This method of L10n/i18n will of course be completely optional. If theme/plugin authors don’t like it, they don’t have to adopt it. If they don’t like the translation of a common string, or if they they want to use a common string within a context that changes its meaning, they can simply specify their own text domain for that common string.

I attach a file of common strings to which I add from time to time. It will give you a good idea of the extend of strings than are commonly needed in core, plugins, and themes.

What do you think?

Attachments (1)

strings-common.php (5.9 KB) - added by demetris 4 years ago.
A collection of strings commonly used by core, plugins, and themes

Download all attachments as: .zip

Change History (26)

@demetris4 years ago

A collection of strings commonly used by core, plugins, and themes

comment:1 @scribu4 years ago

I attempted to do this myself, but the fact that the POT script ignores the textdomain put me off.

I would be happy if just that got fixed.

comment:2 follow-up: @scribu4 years ago

PS: The only common strings that are needed by most plugins are those related to settings pages, specifically to admin notices.

comment:3 in reply to: ↑ 2 ; follow-up: @demetris4 years ago

  • Cc dkikizas@… added

Argh! Sometimes Trac decides (following no pattern that I can discern) that it will not notify me of comments to tickets I’ve opened.

Replying to scribu:

PS: The only common strings that are needed by most plugins are those related to settings pages, specifically to admin notices.

I think even plugins, which don’t have all the more or less identical front-end strings found in themes, could use quite a few strings from a common pool. Here are a a few candidates, off the top of my head:

__('About this plugin');
__('Disable');
__('Disabled');
__('Donate to this plugin');
__('Enable');
__('Enabled');
__('Leave empty for default');
__('Plugin homepage')
__('Reset options');

etc. etc.

comment:4 @filosofo4 years ago

In theory, it's a good idea, but how helpful would it actually be in practice?

  • Since plugins and themes still must localize their non-common strings, how much is saved by localizing a few of their strings in the common pool? I would suspect most of the memory consumption happens in the overhead of loading the plugins' .mo files, which will happen anyways.
  • How many resources will be wasted by loading localized strings that end up not being used?
  • It seems like by its intent we would only add, not substract, strings, so the common strings' memory-consumption footprint would grow over time.

To get a better idea of how helpful this would be, I would like to see the following data:

  • How much a single, e.g., 20-character string adds to the memory overhead.
  • What the percentage is of shared strings across plugins in the repository.

comment:5 @filosofo4 years ago

What the percentage is of shared strings across plugins in the repository.

What I mean is, a list of the top strings ordered by descending number of occurrences across all plugins (or if you'd prefer the top 100 most popular).

Roughly speaking, for this to be worthwhile the amount of memory saved would have to be greater than that used by the common strings, so you'd have to say that given an average X number of plugins installed, there will be Y distinct strings shared across the plugins.

So if there are 160 strings in the common file not used in core and an average of 20 active plugins, then to save memory, on average those plugins would need all to share 8 distinct strings. That seems improbable to me, but the only way to know for sure is to see some real numbers.

comment:6 @demetris4 years ago

@filosofo:

Thank you for your comments!

The amount of strings I have in mind for strings-common.php is not that large. I was thinking about maybe 100–200 stable strings from core added to the common pool within the first few months, and at most 50–60 non-core strings added within the first year, depending on feedback from plugin/theme authors and translators, and on stats from the two repos.

(The text of my proposal or the file I attached may give the impression that I am talking about larger numbers, but the attached file includes only a few no-brainers, along with lots of candidates, some strong and some weak.)

Now, 100—200 core strings added to the proposed common pool may not be impressive, but, considering that I was able to make a theme whose 70 out of 100 strings were core strings, it would be a great help. If I knew that there are even 100 core strings designated for common use and not liable to change, I would start using this method right now.

About getting stats from the repos, I agree with you. They would be very useful. Even more so if we want to start adding non-core strings to a common pool.

I would do that myself, but my skills at text search and retrieval are below elementary.

comment:7 @sbressler4 years ago

  • Cc sbressler@… added

comment:8 @hakre4 years ago

Related: #14039 (Duplicate?)

comment:9 in reply to: ↑ 3 @Denis-de-Bernardy4 years ago

Replying to demetris:

Argh! Sometimes Trac decides (following no pattern that I can discern) that it will not notify me of comments to tickets I’ve opened.

It won't subscribe you to tickets you open. Only to those you answer to. Just add a comment whenever you open a ticket to work around it.

comment:10 @demetris4 years ago

@Denis: Thanks! That explains it.

As I promised in the last dev meetup, I wrote a post explaining the idea and the proposal:

http://op111.net/80

I hope to reach more authors and translators of plugins/themes in this way, so that we can get some more feedback.

comment:11 @nbachiyski4 years ago

Somebody raised that issue recently and I've been thinking about it for some time. Here are some random thoughts:

  • Strings will change. If we keep a semi-official list, we will have to keep it growing forever.
  • Most of the strings can and should be part of template functions. I think it's the better way to both standardize stuff and don't struggle with strings backwards-compatiblity problems. We don't have any template tags for admin plugins, but large percent of plugins code is related to their admin UI.
  • On the idea of putting many strings in a big file (like strings-common.php). We have two choices of referencing the strings in the file:
    • Give them unique IDs (either in array or globals). Choosing string IDs is as hard as choosing variable names (and this is one of the two hardest problems in programming). Also, each string should be as near, to the code that uses it, as possible.
    • Include just a big pile of __() calls in the file, like it's done in the proposed patch. I am not a big fan because of the "strings will change" reason.
  • If we include strings without a domain in the plugin, there will be no way for the plugin translator to override them. Sometimes in a plugin Header means something totally different than in core. Such words are hard to translate anyway. For possible solutions of this problem see below.
  • Translation functions should fallback to core strings when they cannot find a plugin/theme string (with a domain).
  • In any case we will need better custom tools.
    • Custom strings extractor. This is easy and we should do it anyway, because relying on xgettext being present has failed us many times (on many servers).
    • The extractor can only extract strings, with domain.
    • Or, the extractor can choose only strings, which are missing in the current stable release of WordPress and exclude common ones from the POT. This way translators will see only the different ones. In production these will fallback to core strings (see previous point.)
  • After writing all this I realized that breaking strings backwards compatiblity is acceptable. Plugins and themes will still work with only minor (and easy fixable) glitches.

comment:12 @GruffPrys4 years ago

  • Type changed from defect (bug) to enhancement

Hi guys - I'm new to contributing to open projects so be gentle ;)

Whilst Demetris' suggestion is quite ingenious in a number of ways, I believe it has a few drawbacks.

I'm more of a translator than a developer, so obviously I'd welcome any developments that lessen the workload and increase the consistency of the translation. However, it seems to me that this suggestion proposes to do so at the expense of complicating the developers' internationalization process and increasing both their workload and the expertise required from them.

In the proposed scenario, developers may have to become familiar with what is contained within any proposed strings-common.php file and what is not in order to decide when to use their text domain and when not to.

This would then require developers to learn the difference between, say 'search' (the verb) and 'search' as a mass noun or count noun, and the implications of these differences on the translation before they can tell if the form of the word that they'r after is actually in strings-common.php or not. There's a string containing the phrase "Search Help" in WP 3.0.+. Is it fair to expect developers to understand that "Search Help" could be a noun phrase or a verb phrase (depending on context), and could require very different translations accordingly?

Personally, I think that is too much to ask of developers. The danger is that the proposed method wouldn't be adopted, or would be misused (i.e. we'd see the wrong translations of 'search', 'post' or 'vote' appear in plugins and themes). The beauty of the current method is that it is quite simple for developers to understand, whilst it leaves the linguistic decisions in the hands of linguists.

I'd argue that this simplicity and the developer/linguist split needs to be retained, and that the issues of the translator's workload and the consistency of translation could better be addressed by integrating a translation memory system and terminology management into GlottPress.

Each language could have its own translation memory and glossary of terms for each project, whilst suggestions from the memories and glossaries of other projects could also be displayed for that language. This would help with both the consistency of translation and the translator's work flow, whilst keeping translators in control of the linguistic aspects of the translation.

I realise that incorporating such systems would be a lot of development work (we've developed a web-based terminology dictionary development environment at work), but I think ultimately it would be the correct way to go about it.

comment:13 @demetris4 years ago

Thanks for the thoughts, Nikolay.

Replying to nbachiyski:

Strings will change. If we keep a semi-official list, we will have to keep it growing forever.

The way I think about it, we won’t just be throwing stuff in the list. It will be a list carefully curated, and it will include only strings commonly used in web-publishing environments; not strings that we expect to be obsolete in the next two or three WordPress versions. For example, see my small sample list at http://op111.net/80 — Also, strings would be reviewed before going into the common pool, so that we wouldn’t have to modify them later for grammar, punctuation, capitalization etc. etc.

Most of the strings can and should be part of template functions. I think it's the better way to both standardize stuff and don't struggle with strings backwards-compatiblity problems. We don't have any template tags for admin plugins, but large percent of plugins code is related to their admin UI.

This has been on my mind too, as core offers more and more functionality and strings to themes in the form of template functions (e.g., the new comment_form and login_form functions). But I could not find a way to make something of it in the context of my proposal. So I think of the two as independent for now.

On the idea of putting many strings in a big file (like strings-common.php). We have two choices of referencing the strings in the file:
Give them unique IDs (either in array or globals). Choosing string IDs is as hard as choosing variable names (and this is one of the two hardest problems in programming). Also, each string should be as near, to the code that uses it, as possible.
Include just a big pile of __() calls in the file, like it's done in the proposed patch. I am not a big fan because of the "strings will change" reason.

I like the big pile approach because it is simple: We only need one extra file and one conditional include. (If locale is not en_US.)

I also like it because it will present all strings designated for common use in a friendly, humanly readable form, to make them easily discoverable.

(By the way, the attached file is not a proposed patch. It’s just a file I keep open in a tab in my editor, to throw in strings that seem interesting candidates.)

If we include strings without a domain in the plugin, there will be no way for the plugin translator to override them. Sometimes in a plugin Header means something totally different than in core. Such words are hard to translate anyway. For possible solutions of this problem see below.

We don’t need to include strings that become ambiguous out of context. But, in any case, it seems to me that such strings are only a small percentage of commonly used strings.

If, for some reason, we want to include such a string, we could use _x(). Or am I wrong?

In any case we will need better custom tools.

If the pot-making script can be tweaked to accept a textdomain as an optional argument (so that, for example, it makes a POT only with strings of the mysuperbtheme domain), I would be very happy. Scribu too said that he would be happy if just that got added.

comment:14 follow-up: @Arlen224 years ago

Here are a few thoughts.

*Each file should have the locale ID on the end of it ("strings-common.en.php").
*I think we could purge unused strings from the list (after due discussion) on each line increment (2.9.2). Version (2.9.2 would probably be too fast and volatile.
*Code comments would take care of meaning problems. See my example below

<?php
__('Search', ''); # This is the Action ("Search for the keys.")

comment:15 in reply to: ↑ 14 ; follow-up: @Arlen224 years ago

Replying to Arlen22:
Here is the list in list format.

  • Each file should have the locale ID on the end of it ("strings-common.en.php").
  • I think we could purge unused strings from the list (after due discussion) on each line increment (2.9.2). Version (2.9.2 would probably be too fast and volatile.
  • Code comments would take care of meaning problems. See my example below

Here are a few thoughts.

*Each file should have the locale ID on the end of it ("strings-common.en.php").
*I think we could purge unused strings from the list (after due discussion) on each line increment (2.9.2). Version (2.9.2 would probably be too fast and volatile.
*Code comments would take care of meaning problems. See my example below

<?php
__('Search', ''); # This is the Action ("Search for the keys.")

comment:16 in reply to: ↑ 15 @Arlen224 years ago

Replying to Arlen22: And once again, 2.9.2 is line increment, 2.9.2 is version increment.

After this, I think I will use preview a bit more!

Replying to Arlen22:
Here is the list in list format.

  • Each file should have the locale ID on the end of it ("strings-common.en.php").
  • I think we could purge unused strings from the list (after due discussion) on each line increment (2.9.2). Version (2.9.2 would probably be too fast and volatile.
  • Code comments would take care of meaning problems. See my example below

Here are a few thoughts.

*Each file should have the locale ID on the end of it ("strings-common.en.php").
*I think we could purge unused strings from the list (after due discussion) on each line increment (2.9.2). Version (2.9.2 would probably be too fast and volatile.
*Code comments would take care of meaning problems. See my example below

<?php
__('Search', ''); # This is the Action ("Search for the keys.")

comment:17 follow-up: @westi4 years ago

I'm wondering whether a better solution to this issue may be a GlotPress instance for plugin translation which auto-leverage's strings which have been used elsewhere so they only need approving.

That way the plugins get the benefit of being able to re-use translations if they already exist while still being able to use a different translation if more appropriate.

If we provided this kind of tools support would this improve things for translators enough?

comment:18 in reply to: ↑ 17 ; follow-up: @scribu4 years ago

Replying to westi:

I'm wondering whether a better solution to this issue may be a GlotPress instance for plugin translation which auto-leverage's strings which have been used elsewhere so they only need approving.

That way the plugins get the benefit of being able to re-use translations if they already exist while still being able to use a different translation if more appropriate.

That would be awesome!

Even without that feature, I can't wait for GlotPress to become available for all plugins & themes.

comment:19 @Denis-de-Bernardy4 years ago

Would it be possible, when generating pot and mo files, to automatically fill in a default translation for strings that are also present in core?

comment:20 @demetris4 years ago

Replying to GruffPrys:

Hi guys - I'm new to contributing to open projects so be gentle ;)

Hello, GruffPrys, and welcome to the discussion!

Whilst Demetris' suggestion is quite ingenious in a number of ways, I believe it has a few drawbacks.

I'm more of a translator than a developer, so obviously I'd welcome any developments that lessen the workload and increase the consistency of the translation. However, it seems to me that this suggestion proposes to do so at the expense of complicating the developers' internationalization process and increasing both their workload and the expertise required from them.

In the proposed scenario, developers may have to become familiar with what is contained within any proposed strings-common.php file and what is not in order to decide when to use their text domain and when not to.

First, the way I think of it, this method would be optional. Completely optional. So, if a theme/plugin author does not want to use it, they can keep doing things the way they know. For them the method might as well not be there at all.

(We could force the proposed method on everyone by making makepot.php automatically ignore strings that are also found in strings-common.php. This is certainly feasible from a technical point of view, as Mark Jaquith said at the dev meetup of 7 Oct 2010 and as Nikolay hinted above. But I think it would not be a good approach, at least not for know.)

Second, I have an inkling that authors, at least theme authors, will like it.

If you were to choose a theme for translation, would you pick one with 100 strings or one with 50 strings, everything else being equal?

Plugin and theme authors who understand the value of i18n and l10n will also understand this. At least so I think.

This would then require developers to learn the difference between, say 'search' (the verb) and 'search' as a mass noun or count noun, and the implications of these differences on the translation before they can tell if the form of the word that they'r after is actually in strings-common.php or not. There's a string containing the phrase "Search Help" in WP 3.0.+. Is it fair to expect developers to understand that "Search Help" could be a noun phrase or a verb phrase (depending on context), and could require very different translations accordingly?

First, as I say in my previous reply, we don’t have to include ambiguous strings. (Which are only a small percentage anyway.)

Second, if core has ambiguous strings like “Search Help”, maybe we should look at them independently of i18n. E.g., does “Search Help” mean “Search for Help”? Then maybe it should be “Search for Help”. Or does it mean “Help for searching”? Then maybe it should be “Help for searching”.

Personally, I think that is too much to ask of developers. The danger is that the proposed method wouldn't be adopted, or would be misused (i.e. we'd see the wrong translations of 'search', 'post' or 'vote' appear in plugins and themes). The beauty of the current method is that it is quite simple for developers to understand, whilst it leaves the linguistic decisions in the hands of linguists.

I'd argue that this simplicity and the developer/linguist split needs to be retained, and that the issues of the translator's workload and the consistency of translation could better be addressed by integrating a translation memory system and terminology management into GlottPress.

Each language could have its own translation memory and glossary of terms for each project, whilst suggestions from the memories and glossaries of other projects could also be displayed for that language. This would help with both the consistency of translation and the translator's work flow, whilst keeping translators in control of the linguistic aspects of the translation.

I realise that incorporating such systems would be a lot of development work (we've developed a web-based terminology dictionary development environment at work), but I think ultimately it would be the correct way to go about it.

In general I understand what you are saying, and I have been having similar thoughts. The question I have asked myself, in short is:

Many plugin/theme authors seem to struggle even with the bare essentials of i18n, and now I propose to add another piece to the language puzzle?

But I think that i18n should not be treated any differently than other parts of WP. WP introduces all the time various devices meant to make life easier for developers and users, but not all of them are adopted as quickly as we would like. Sometimes because they don’t help backwards compatibility, sometimes because devs are happy with what they have, sometimes because we don’t advertise properly our new stuff, sometimes for other reasons.

It would be the same for the method I propose. What we can do, if we introduce a common pool or something to that effect, is: (a) Make the method easy to understand and use. (b) Advertise it well. (c) Explain it with examples.

comment:21 in reply to: ↑ 18 @demetris4 years ago

Replying to scribu:

Replying to westi:

I'm wondering whether a better solution to this issue may be a GlotPress instance for plugin translation which auto-leverage's strings which have been used elsewhere so they only need approving.

That way the plugins get the benefit of being able to re-use translations if they already exist while still being able to use a different translation if more appropriate.

That would be awesome!

Even without that feature, I can't wait for GlotPress to become available for all plugins & themes.

The tools we use for making PO and MO files already have something like that. You first build a translation memory, and then you use that memory to autotranslate strings that are already in it.

I use the translation memory myself, it is certainly useful, but I guess its usefulness depends on the circumstances and on the strings. For short strings, for example, I find that, instead of helping me, it hinders me: In the time it takes me to verify that the suggested translation for, say, Categories, is the correct one, I would have typed the translation and moved to the next string.

In general, as a translator, I would like to not have to bother at all, not even for verifying an automatic suggestion, with stock strings. There are a lot of common things plugins and themes need to say, and there are not many variations for each of these strings:

  • Categories
  • Edit
  • Next post
  • Previous Post
  • Skip to main content
  • View all posts by %s
  • View all posts by this author

How many ways are there to say things like that? They are stock not only in WordPress but, some of them, in all publishing platforms and in the web in general.

comment:22 @scribu4 years ago

Again we come back to the thorny problem of choosing which strings are common and which aren't. Just to ilustrate the point, I've never used any of those strings in any of my plugins.

And, as others have suggested, really common strings should find their way into template tags. The most recent example is the new submit_button(): #15064

comment:23 follow-up: @Denis-de-Bernardy4 years ago

@Demetria: still, it seems to me that instead of trying to standardize strings like this, a simpler approach would be to check, at the php level, if there is a plugin translation, and if not trying the wp translation; or, at the po/mo level, if there is a wp translation for a particular string and pre-filling it when generating the files.

The php approach would amount to falling back to the default text domain when the localization of the plugin or theme is not around.

Or maybe I'm not completely understanding this ticket. If so, just ignore the suggestion.

comment:24 in reply to: ↑ 23 @nbachiyski4 years ago

Replying to Denis-de-Bernardy:

@Demetria: still, it seems to me that instead of trying to standardize strings like this, a simpler approach would be to check, at the php level, if there is a plugin translation, and if not trying the wp translation; or, at the po/mo level, if there is a wp translation for a particular string and pre-filling it when generating the files.

The php approach would amount to falling back to the default text domain when the localization of the plugin or theme is not around.

We will definitely do that, but the problem here is that translators won't know which strings are in core and which aren't, so that they know which to translate and which they can leave blank.

In general I am with westi, here. We'd better invest in tools that do this for us, instead of trying to move any burden to plugin developers. Internationalization has always been the boring thing you have to do after you write your magnificent plugin, it should be as simple as possible. Reading through a list of strings and trying to use generic ones is nowhere near simple.

comment:25 @dd324 years ago

  • Milestone Awaiting Review deleted
  • Resolution set to maybelater
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.