WordPress.org

Make WordPress Core

Opened 4 years ago

Closed 4 years ago

#36843 closed defect (bug) (worksforme)

Wrong Taxonomy Term orderby 'name' in Cyrillic

Reported by: esemlabel Owned by:
Milestone: Priority: normal
Severity: normal Version: 4.5.2
Component: Database Keywords:
Focuses: Cc:

Description

There is bug in taxonomy terms sorting by name in both cases - in wp-admin and in front-end, if using term names in Cyrillic (Ukrainian). Maybe problem not only in taxonomy terms sorting and not only in Ukrainian (uk) language, but may also exist in other Cyrillic languages.

The term names that starts with one of these three letters (Іі), (Її) or (Єє) are always at the begining of the list, instead of its actual alphabetic position. And the names that starts with letter (Ґґ) are always at the end of the list.

Here is correct Ukrainian (uk) alphabetic order.

Аа, Бб, Вв, Гг, Ґґ, Дд, Ее, Єє, Жж, Зз, Ии, Іі, Її, Йй, Кк, Лл, Мм, Нн, Оо, Пп, Рр, Сс, Тт, Уу, Фф, Хх, Цц, Чч, Шш, Щщ, Ьь, Юю, Яя.

Change History (11)

#1 @esemlabel
4 years ago

  • Summary changed from Wrong Taxonomy Term orderby 'name' in Cyrillic (Ukrainian) to Wrong Taxonomy Term orderby 'name' in Cyrillic

#2 follow-up: @boonebgorges
4 years ago

Hi @esemlabel - Thanks for the ticket, and welcome to WordPress Trac!

What is the collation on your wp_terms database table? (SHOW TABLE STATUS LIKE 'wp_terms')

The characters that are out of order appear to be those that (pardon me while I recollect the Russian I took in college) are not part of the Russian alphabet. It's possible that you're using a Cyrillic collation that is designed for Russian, and it doesn't know what to do with these Ukrainian-only characters. Maybe someone like @SergeyBiryukov could chime in to confirm this hypothesis.

#3 in reply to: ↑ 2 @esemlabel
4 years ago

@boonebgorges, thanks for replying!

Of course, it is NOT a part of Russian language. Ukrainian is one different from ru language that also uses Cyrillic alphabet. Many countries uses Cyrillic (https://en.wikipedia.org/wiki/Cyrillic_alphabets), but each Cyrillic-country has its own language-specific symbols in addition to common Cyrillic letters. For example, (Ёё, Ыы, Ъъ) in Russian, (Ґґ, Іі, Її, Єє) in Ukrainian, etc. And it has to be in right alphabetic position.

I understand that someone have to set properly letters order for each Cyrillic language in code. But it has to be done. And not only in Ukrainian, but also in Belarusian, Serbian, Macedonian and many others, which translation UI presents in WordPress Core.

Table Status

array(1) {
  [0]=>
  object(stdClass)#4625 (18) {
    ["Name"]=>
    string(8) "wp_terms"
    ["Engine"]=>
    string(6) "MyISAM"
    ["Version"]=>
    string(2) "10"
    ["Row_format"]=>
    string(7) "Dynamic"
    ["Rows"]=>
    string(2) "25"
    ["Avg_row_length"]=>
    string(2) "49"
    ["Data_length"]=>
    string(4) "1284"
    ["Max_data_length"]=>
    string(15) "281474976710655"
    ["Index_length"]=>
    string(5) "11264"
    ["Data_free"]=>
    string(2) "36"
    ["Auto_increment"]=>
    string(3) "131"
    ["Create_time"]=>
    string(19) "2016-04-24 12:27:37"
    ["Update_time"]=>
    string(19) "2016-05-15 15:03:33"
    ["Check_time"]=>
    string(19) "2016-04-24 12:27:37"
    ["Collation"]=>
    string(15) "utf8_general_ci"
    ["Checksum"]=>
    NULL
    ["Create_options"]=>
    string(0) ""
    ["Comment"]=>
    string(0) ""
  }
}

#4 @knutsp
4 years ago

utf8_general_ci will not work. It's the same problem with Norwegian. æøå will be sorted incorrectly if the column's collation is not set to utf8_danish_ci or utf8mb4_danish_ci.

The two most important columns to fix are posts.post_title and terms.name. Beware that it's not sufficient to set the collation for the table, is this is just a default when new columns are created. It must be set on each individual indexed column contaning characters of the local language.

Should WordPress offer to set the correct collation for these DB columns?

#5 @boonebgorges
4 years ago

  • Milestone Awaiting Review deleted
  • Resolution set to invalid
  • Status changed from new to closed

@esemlabel I wasn't trying to say that Ukrainian is the same as Russian, or that it's not important for WordPress to get this right :)

What I'm saying is: WordPress term 'orderby=name' queries tell MySQL to ORDER BY name. It's up to MySQL to provide data in the right order. It makes this decision based on the collation of the name column. See http://dev.mysql.com/doc/refman/5.7/en/charset-general.html. In your case, utf8_general_ci is not sufficient to sort Ukrainian (though, as I hypothesized, it gets Russian correct, which is why your sort order is incorrect in the way that it is). See http://stackoverflow.com/a/766996.

As noted by @knutsp, the fix in your case is to change the collation for the name column in your wp_terms table, along with any other relevant columns in your database. I believe that utf8_unicode_ci is the correct collation for Ukrainian when using a UTF8 charset.

Should WordPress offer to set the correct collation for these DB columns?

WP tries to use sensible defaults (utf8mb4 and utf8mb4_unicode_ci, where supported) for the database tables that it creates at installation. WordPress 4.3 introduced a routine that upgraded existing tables to utf8mb4/utf8mb4_unicode_ci on sites that support that combination. When WP can't do this migration, I don't know that there's enough information available to assume that it knows what the "correct" collation is for a given installation - it depends on a combination of the current charset, what the installation of MySQL supports, and the language(s) that need to be supported by the database. So @knutsp, I think the answer is "no". But if you think there's a feasible way for WP to do this without breaking stuff, let's discuss it in an enhancement ticket.

#7 @esemlabel
4 years ago

Why the status is closed? Does it has a fix?

Collation can be choosen on new db on its creation. But what about existing database?

Is there any user-friendly tool to convert collation of existing db?

#8 @esemlabel
4 years ago

  • Resolution invalid deleted
  • Status changed from closed to reopened

'utf8_unicode_ci' doesn't resolve the issue

I started new dev server and created new db with 'utf8_unicode_ci' collation in phpmyadmin. I also set collation in wp

define('DB_COLLATE', 'utf8_unicode_ci');

After adding terms the problem still exist.

So 'utf8_unicode_ci' is defently not what will fix terms orderby 'name' in Ukrainian Cyrillic.
Is there any other ideas?

Last edited 4 years ago by esemlabel (previous) (diff)

#9 @boonebgorges
4 years ago

  • Component changed from Taxonomy to Database

#10 @esemlabel
4 years ago

I've found, that querying SQL direct in phpMyadmin console with code, for example:

SELECT * FROM wp_terms ORDER BY name COLLATE utf8_unicode_ci

gives success result! Term names in Ukrainian Cyrillic successfully ordered by name in right alphabetic order.

I am asking wp-gurus to write function that can alter general get terms function to include 'COLLATE utf8_unicode_ci' parameter, until WordPress Core will have such fix.

Maybe there is a hook, action or filter for changing get terms process globally (in back-end on term pages and in front-end) ?

Please, help!

#11 @esemlabel
4 years ago

  • Resolution set to worksforme
  • Status changed from reopened to closed

Resolved

I've finally, got it worked.

  1. In wp-config.php: define('DB_COLLATE', 'utf8_unicode_ci');
  2. Changed Collation of database: in db root in phpMyAdmin > Operation > Collation.
  3. Changed Collation of 'wp_terms' table (could be done to each table): in table root > Operation > Table Options > Collation.
  4. Changed server variables: in phpMyAdmin root > Variables > changed 'collation connection', 'collation database' and 'collation server' parameters to 'utf8_unicode_ci'.

Now all works fine, in back-end and front-end.

Can wordpress do such routine automatically?

Note: See TracTickets for help on using tickets.