Make WordPress Core

Opened 18 years ago

Closed 18 years ago

Last modified 18 years ago

#3687 closed enhancement (fixed)

translatable strings descriptions

Reported by: nbachiyski's profile nbachiyski Owned by:
Milestone: 2.2 Priority: normal
Severity: normal Version:
Component: General Keywords: i18n has-patch
Focuses: Cc:

Description

The problems:

  • gettext doesn't provide support for strings disambiguations - one one the same words could be used in different contexts with different meanings, however the translation is only one
  • a description cannot be attached to a translatable string - consider the string the $1%s $1%s. If the translator doesn't take the time to dig into the code, one couldn't translate it properly

A sample solution:

We could introduce a new function __withdesc(), which works like __(), but strips all the text from the translation after the first | character. Thus, we could provide descriptions to translators and prevent string ambiguities.\

Here are some examples:

  • __withdesc('Editor|role') and __withdesc('Editor|rich-text textarea') - we had this problem some time ago
  • __withdesc('%1$s – %2$s|1: date, 2: time')

Attachments (2)

with-desc.diff (563 bytes) - added by nbachiyski 18 years ago.
l10n.php-misc-revised.diff (1.6 KB) - added by nbachiyski 18 years ago.
_c with description after the string, common translate function, ngettext filter

Download all attachments as: .zip

Change History (23)

#1 @markjaquith
18 years ago

How is this presented to translators in the .pot file?

#2 @nbachiyski
18 years ago

In the pot files appears the whole string, with the delimiter. Even if they translate put into the translation the second part it will be stripped by the function.

#3 @ryan
18 years ago

See #3337 and the thread referenced there for more background.

I'm fine with this route since GNOME and other projects went there. We might want to avoid the function prefix though since php reserves it.

#4 @ryan
18 years ago

Oops. wiki formatting strikes. Avoid the double underscore prefix.

#5 @nbachiyski
18 years ago

Yeas, I am not heavy PHP5 user and have missed the caution.

Unfortunately it means that we have to rename __(), __e() and __ngettext() also :(

We could leave them with just one underscore in the beginning or prefix them with something like i18n_.

P.S. I have always liked t() as a name for a translating function :)

#6 @ryan
18 years ago

I've been trying out the context support in xgettext. Given this function:

function _c($context, $string);

Called as such:

_c('contextual', 'translate me');

If we add this to the call to xgettext:

--keyword=_c:1c,2

We get this in the pot:

msgctxt "contextual"
msgid "translate me"
msgstr ""

That's well and good, but poedit doesn't know what to do with msgctxt nor does our php gettext implementation. So, I think we should decide on a function name and go with the glib style context that you are proposing. I like _c().

#7 @yskin
18 years ago

-c, --add-comments[=TAG]
place comment block with TAG (or those preceding keyword lines) in output file

I found it in the output of "xgettext --help".

Use this code:

/*auto comment*/
__('test');

Will get this in .pot file:

#. auto comment
msgid "test"
msgstr ""

The comment begin with “#.” is xgettext automatic comment. poEdit show it in the top right box.

But I don't know how to use the "TAG". Maybe we can use a special tag for translate description.

Update:

I have known how to use "TAG".

Code:

/*desc: auto comment */
__('test');

xgettext:

--add-comments=desc

.pot file:

#. desc: auto comment
msgid "test"
msgstr ""

“desc: auto comment” will be shown in the top right box in poEdit.

Sorry for my poor english. What about this method?

#8 @ryan
18 years ago

That is good for providing descriptive context, but it does not help with disambiguation since source strings are still identical.

Using this to add descriptions to our strings is a good idea, though, especially those strings with many printf format specifiers. The description could be used to describe the different arguments. This would be a good project for someone to work on under a new ticket.

#9 @nbachiyski
18 years ago

I personally do not care much about the function name.

Previously I chose __withdesc, because this function does not follow any gettext convention (as opposed to ngettext for example) and sounded more suitable, because it was a little bit more descriptive than _c or any other single-letter alternative.

#10 @ryan
18 years ago

  • Resolution set to fixed
  • Status changed from new to closed

(In [5081]) Add _c() for disambiguatin translateable strings. Props nbachiyski. fixes #3687

#11 @ryan
18 years ago

I went ahead and added _c() to get things rolling. We can change the name if we don't like _c. I didn't introduce and echo variant. I changed the implementation a bit, so the code needs review.

#12 @nbachiyski
18 years ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

Do you really want the description to be the first part of the message?

Also, why should we copy the implementation of __ here if we could use it, is that function call so expensive? I could say the same for _e. The pattern:
apply_filters('gettext', $l10n[$domain]->translate($text), $text) appears three times in l10n.php.

#13 @ryan
18 years ago

glib's Q_() puts the context up front. Prepending the context allows the text to have any number of | characters without causing problems.

We cannot call () from _c(). xgettext will extract the text passed to () and put the literal string '$text' in the pot. That's why we cut-and-paste the code three times.

#14 @nbachiyski
18 years ago

You are damn right about the |. I had implemented that feature before and when I was writing that patch I thought is was suspiciously simple :)

Why don't we have a, for example, translate function, which contains the common code.

#15 @nbachiyski
18 years ago

Ryan, writing the descriptions in the beginning is awful. I made it the other way round and I think thus it will be more intuitive for translators.

In the following patch there are: the revised _c function, common translate function and a ngettext filter.

@nbachiyski
18 years ago

_c with description after the string, common translate function, ngettext filter

#16 @abelcheung
18 years ago

nbachiyski, I'd tend to disagree the assertion that putting the context would be more easy to understand. For translator more familiar with glib/gtk+ software the previous form is more familiar; for other people I don't think this makes any difference for them.

And another important I'd like to note: just extracting the whole translated string would likely be bad, since most translators don't follow wordpress development, so they don't know this context thingy and just translate the whole thing, INCLUDING CONTEXT. This has been proven true in GNOME, which uses the pipe '|' syntax more extensively. Most translators (especially newcomers) don't recognise this is a special format, despite it is commented (comments are hidden by poEdit or other software).

#17 @abelcheung
18 years ago

Sorry for not making myself clear in last comment.

  1. What I mean about comments in poEdit etc is, the translation comment is not very visible in these programs, and people tend to ignore them.
  1. So the most fail safe way is to strip the context part even in translated string with substr().
  1. Despite [1], heavy comment should be placed just in front of such sentence, for more careful translator who noticed this syntax. We can't expect translator to be wordpress developer themselves.

#18 @abelcheung
18 years ago

  • Cc abelcheung@… added

#19 @abelcheung
18 years ago

  • Cc abelcheung added; abelcheung@… removed

#20 @ryan
18 years ago

  • Resolution set to fixed
  • Status changed from reopened to closed

(In [5258]) Refactor l10n code to reduce duplication. Change placement of context. Props nbachiyski. fixes #3687

#21 @nbachiyski
18 years ago

abelcheung,

First, sorry for the late reply, your comment had slipped through all the trac flow somehow.

I heartily agree that translators tend to ignore comments. So we made two things to mitigate that effect:

  • We already strip the comment part, this was our intention from the beginning (of course, the GNOME guys were doing that also :) )
  • The focus goes on the original string and even if the translators doesn't recognise that the second part is just a description they will have already given most of their attention to the right part of the string. That is why I prefer the description to go to the end.
Note: See TracTickets for help on using tickets.