Make WordPress Core

Opened 9 years ago

Closed 8 years ago

Last modified 8 years ago

#32405 closed defect (bug) (fixed)

Database collation upgrade routine to support UTF8MB4 collations

Reported by: netweb's profile netweb Owned by: pento's profile pento
Milestone: 4.6 Priority: normal
Severity: normal Version: 4.2
Component: Database Keywords: dev-feedback needs-patch
Focuses: Cc:

Description

Currently at this time writing the Finnish team are using the following for the Finnish localised package:

What they wanted to use was:

WordPress currently needs to start with utf8 as the character set as not all sites can support utf8mb4, so utf8 in the config file is automatically upgraded at runtime to utf8mb4 if all the requirements for it's use are met.

This upgrade support does not currently stem the same logic to collations, i.e. if a the collation is set to utf8_swedish_ci in wp-config.php after successfully upgrading of utf8 to utf8mb4 the collation utf8_swedish_ci is NOT upgraded to utf8mb4_swedish_ci.


The following is extracts from a discussion on Slack in #core-i18n, full discussion here

Netweb: “So after the various chats last night it looks like we have the Finnish locale leaving the charset as UTF8 but defining the collation as utf8_swedish_ci for the Finish locale, will that explode?

dd32 “I don’t know the answer here. If anything we probably need some logic to upgrade a utf8_swedish to a utf8mb4_swedish if supported by the server.. I think we also need to look into using utf8mb4_unicode_520_ci when supported too

dd32 "For finish, No it won't explode, but they may have alphabeticalism issues if it doesn't play nice with utf8mb4_unicode_ci."

If the site uses a utf8mb4 charset, and they have a utf8_* character set set, it’ll be overridden to utf8mb4_unicode_ci.

If the site uses utf8 and they set utf8mb4_swedish_ci things will break

If the site uses utf8mb4 and they set utf8mb4_swedish_ci, then.. it’ll use utf8mb4_swedish_ci.

“In other words, customising those values in the default file is really a bad idea. Site admins can do that sure, but it should default to our defaults.”

Change History (6)

This ticket was mentioned in Slack in #core-i18n by netweb. View the logs.


9 years ago

This ticket was mentioned in Slack in #core-i18n by ocean90. View the logs.


8 years ago

#3 @pento
8 years ago

  • Milestone changed from Awaiting Review to 4.6
  • Owner set to pento
  • Status changed from new to assigned

#4 @pento
8 years ago

  • Resolution set to fixed
  • Status changed from assigned to closed

In 37521:

Database: Obey locale-specific utf8 collation settings.

Some sites prefer to use locale-specific location settings. For example, the Swedish WordPress package use utf8_swedish_ci, instead of utf8_unicode_ci. When upgrading the connection to utf8mb4, we were overriding this to be utf8mb4_unicode_ci, instead of maintaining the use of the _swedish_ci variant.

The locale-specific collations do have extra collation rules just for that language, so it's useful to maintain compatibility.

Fixes #32405.

#5 @pento
8 years ago

In 37522:

Tests: Remove a test for a function that can't be tested.

wpdb::init_charset() doesn't lend itself to being tested, so the unit test added in [37521] won't work under most circumstances.

See #32405.

#6 @pento
8 years ago

In 37602:

Tests: Fix an incorrect @ticket header introduced in [37601].

See #32405, #36917.

Note: See TracTickets for help on using tickets.