WordPress.org

Make WordPress Core

Opened 6 years ago

Closed 5 years ago

#7099 closed enhancement (wontfix)

Comments in POT file should ideally explain meaning of character entities

Reported by: leuce Owned by: nbachiyski
Milestone: 2.8 Priority: normal
Severity: normal Version:
Component: I18N Keywords: i18n comment, translation, l10n, POT, PO, entities, pipe, msgctxt
Focuses: Cc:

Description

The wordpress.pot file contains HTML character entities where one might have expected Unicode characters. I suppose this is normal since the input format is PHP which is, after all, a browser viewed format. However, it is not always clear from the context to translators what these entity codes mean.

A translator who had previously translated HTML may be aware of  , amp;, &gt and <, and it is fairly easy to guess what " and &emdash; stands for, but there are also many numbered entities. I suggest that these are explained to translators in a comment or in msgctxt whenever they occur.

Well, in bug 7090 nbachiyski said that msgctxt is for context only, not for comments, so I'm not sure what the ideal solution might be.

Here is a list of the entity codes used in the POT file:

' = single quote
’ = right single quotation mark
“ = left double quotation mark
” = right double quotation mark

& = ampersand
© = copyright sign
> = greater than, closing angle bracket
« = left angle quote
< = lesser than, opening angle bracket
  = non-breaking space
" = straight double quotation mark

» and » = right angle quote
— and — = em dash
… and … = ellips (three dots)
› or › = double right angle quote
— (see — above)
» (see » above)
… (see … above)
› (see › above)

Here's an example of three such cases, and how it might be more useful to translators (if msgctxt is used (but see bug 7090 also)):

#: wp-includes/script-loader.php:98
msgid "Crunching…"
msgstr ""

#: wp-includes/script-loader.php:164
msgid "« Back"
msgstr ""

#: wp-includes/script-loader.php:176
msgid "Send to editor »"
msgstr ""

to:

#: wp-includes/script-loader.php:98
msgctxt "The entity … is an ellips, or three dots"
msgid "Crunching…"
msgstr ""

#: wp-includes/script-loader.php:164
msgctxt ""
"The entity « is a left angle quote, similar to "
"<, which is a lesser-than sign, or a stemless "
"arrow pointing right."
msgid "« Back"
msgstr ""

#: wp-includes/script-loader.php:176
msgctxt ""
"The entity » is a right angle quote, similar to "
">, which is a greater-than sign, or a stemless "
"arrow pointing right."
msgid "Send to editor »"
msgstr ""

Of course, the ideal situation is to simply display the Unicode character in the comments, but I'm not sure if that would be possible, otherwise why not just use the Unicode characters directly in the PHP files anyway, right?

Attachments (1)

wordpress_entitiesonly.odt (23.2 KB) - added by leuce 6 years ago.
All the messages in wordpress.pot that contain entities related to this enhancement/bug report

Download all attachments as: .zip

Change History (4)

leuce6 years ago

All the messages in wordpress.pot that contain entities related to this enhancement/bug report

comment:1 ryan6 years ago

  • Milestone changed from 2.7 to 2.8
  • Type changed from defect to enhancement

Postponed to 2.8.

comment:2 ryan5 years ago

  • Component changed from General to i18n
  • Owner changed from anonymous to nbachiyski

comment:3 nbachiyski5 years ago

  • Resolution set to wontfix
  • Status changed from new to closed

I don't want to pollute the source code with entities descriptions.

If you really think it is a problem, please add this list to the translator documentation in the Codex: http://codex.wordpress.org/Translating_WordPress

Note: See TracTickets for help on using tickets.