Make WordPress Core

Opened 13 years ago

Closed 12 years ago

#15652 closed enhancement (duplicate)

Tags with Accents are Incorrectly Sorted in Cloud

Reported by: catherinebr's profile CatherineBr Owned by:
Milestone: Priority: normal
Severity: normal Version: 3.0.2
Component: I18N Keywords:
Focuses: Cc:

Description

By accents, I mean letters like "é", "ü", "ç", "î", etc.

Currently, in a tag cloud, the word "égypte" gets sorted AFTER the word "zimbabwe". This is incorrect alphabetical sorting. A correct sorting would consider "é" as equivalent to "e", and place "égypte" BEFORE the word "zimbabwe".

The problem stems from this line in CATEGORY-TEMPLATE.PHP :

uasort( $tags, create_function('$a, $b', 'return strnatcasecmp($a->name, $b->name);') );

As a temporary solution, I am using this code :

function replace_accents($str) {

$str = htmlentities($str, ENT_COMPAT, 'UTF-8');
$str = preg_replace('/&([a-zA-Z])(uml|acute|grave|circ|tilde);/', '$1', $str);
return html_entity_decode($str);

}
uasort( $tags, create_function('$a, $b', 'return strnatcasecmp($a->name, $b->name);') );

Attachments (2)

system.sortable.sql (2.6 KB) - added by Denis-de-Bernardy 13 years ago.
public.string.sql (4.1 KB) - added by Denis-de-Bernardy 13 years ago.

Download all attachments as: .zip

Change History (11)

#1 in reply to: ↑ description @CatherineBr
13 years ago

Oups. In the previous code, the last line should read :

uasort( $tags, create_function('$a, $b', 'return strnatcasecmp(replace_accents($a->name), replace_accents($b->name));') );

#2 @scribu
13 years ago

Maybe we should sort by slug instead?

#3 @nacin
13 years ago

  • Milestone changed from Awaiting Review to Future Release

We have remove_accents.

#4 @Denis-de-Bernardy
13 years ago

I suspect that this isn't elegantly fixable unless we store an unaccented version of the names. Or use the slug, as scribu suggests, even though the slug might be changed by users.

As an aside, collation is a colorful topic. From the wikipedia:

http://en.wikipedia.org/wiki/Collation

"Differences between computer numeric sorting and alphabetic sorting occur in Danish and Norwegian (aa is ordered at the end of the alphabet when it is pronounced like å, and at the start of the alphabet when it is pronounced like a), German (ß is ordered as s + s; ä, ö, ü are ordered as a + e, o + e, u + e in phone books, but as o elsewhere, and behind o in Austria), Icelandic (ð follows d), Dutch (ij is sometimes ordered as y; see IJ: Collation), English (æ is ordered as a + e), and many other languages."

Imho, the only reasonable approach is to leave this up to PHP and MySQL respectively, and file bugs on their end when natsort and order by clauses don't work as expected on servers with a properly configured locale. There's no way we'll get to function as end users would expect in WP without excessive amounts of number crunching.

#5 @Denis-de-Bernardy
13 years ago

  • Component changed from General to I18N
  • Type changed from defect (bug) to enhancement

#6 follow-up: @CatherineBr
13 years ago

For a beginner like me, the natural place to specify the locale is WP-CONFIG.PHP, and not the server, because I wouldn't know how to do it. My WP-CONFIG.PHP says :

define ('WPLANG', 'fr_FR');

So, couldn't WordPress read that WPLANG constant, and modify the sorting accordingly ? This is beyond my competence, but I found a PHP code that seems to do that, using the "Collator" class :

$coll = collator_create( 'en_US' );
$res = collator_compare( $coll, $s1, $s2 );

But this "Collator" class seems to require a PHP extension of some sort, as I couldn't get it to execute.

#7 in reply to: ↑ 6 @Denis-de-Bernardy
13 years ago

Replying to CatherineBr:

So, couldn't WordPress read that WPLANG constant, and modify the sorting accordingly ? This is beyond my competence, but I found a PHP code that seems to do that, using the "Collator" class :

Sadly not. Setting locale at the PHP level is not thread safe in PHP at the time of writing this. So it's generally not acceptable for hosts to allow end users to tweat the setting. In other words, either we implement UTF-8 collation rules in WP (which is insanely complex, since it depends on heaps of things) or we've to deal with things as they're returned by strnatcasecmp(), which is acceptable for English but, as you point out, has a few issues with other locale.

I'm attaching a few PostgreSQL functions in case there's interest in trying to work around this at the MySQL level.

#8 @risager
12 years ago

  • Keywords changed from sorting, sort, accents, cloud to sorting sort accents, cloud

A similar problem for the Danish letter å:

We have 3 special characters in Danish: æ, ø and å.

When using get_tag the letters æ and ø sorts correctly, but å is placed before a in the alphabet (should be the last letter).

I am using WordPress 3.3 da_DK

#9 @SergeyBiryukov
12 years ago

  • Keywords sorting sort accents cloud removed
  • Milestone Future Release deleted
  • Resolution set to duplicate
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.