#32405 closed defect (bug) (fixed)
Database collation upgrade routine to support UTF8MB4 collations
Reported by: | netweb | Owned by: | pento |
---|---|---|---|
Milestone: | 4.6 | Priority: | normal |
Severity: | normal | Version: | 4.2 |
Component: | Database | Keywords: | dev-feedback needs-patch |
Focuses: | Cc: |
Description
Currently at this time writing the Finnish team are using the following for the Finnish localised package:
define('DB_CHARSET', 'utf8');
anddefine('DB_COLLATE', 'utf8_swedish_ci');
- http://i18n.trac.wordpress.org/browser/fi/branches/4.2/dist/wp-config-sample.php
What they wanted to use was:
define('DB_CHARSET', 'utf8mb4');
anddefine('DB_COLLATE', 'utf8mb4_swedish_ci');
- http://i18n.trac.wordpress.org/changeset/26724
WordPress currently needs to start with utf8
as the character set as not all sites can support utf8mb4
, so utf8
in the config file is automatically upgraded at runtime to utf8mb4
if all the requirements for it's use are met.
This upgrade support does not currently stem the same logic to collations, i.e. if a the collation is set to utf8_swedish_ci
in wp-config.php
after successfully upgrading of utf8
to utf8mb4
the collation utf8_swedish_ci
is NOT upgraded to utf8mb4_swedish_ci
.
The following is extracts from a discussion on Slack in #core-i18n, full discussion here
Netweb: “So after the various chats last night it looks like we have the Finnish locale leaving the charset as UTF8 but defining the collation as utf8_swedish_ci for the Finish locale, will that explode?
dd32 “I don’t know the answer here. If anything we probably need some logic to upgrade a
utf8_swedish
to autf8mb4_swedish
if supported by the server.. I think we also need to look into usingutf8mb4_unicode_520_ci
when supported too
dd32 "For finish, No it won't explode, but they may have alphabeticalism issues if it doesn't play nice with
utf8mb4_unicode_ci
."
If the site uses a
utf8mb4
charset, and they have autf8_*
character set set, it’ll be overridden toutf8mb4_unicode_ci
.
If the site uses
utf8
and they setutf8mb4_swedish_ci
things will break
If the site uses
utf8mb4
and they setutf8mb4_swedish_ci
, then.. it’ll useutf8mb4_swedish_ci
.
“In other words, customising those values in the default file is really a bad idea. Site admins can do that sure, but it should default to our defaults.”
In 37521: