Opened 10 months ago
Last modified 7 months ago
#21212 reopened enhancement
MySQL tables should use utf8mb4 character set
| Reported by: |
|
Owned by: | |
|---|---|---|---|
| Priority: | normal | Milestone: | Awaiting Review |
| Component: | Database | Version: | 3.4.1 |
| Severity: | minor | Keywords: | |
| Cc: |
Description (last modified by SergeyBiryukov)
Historically, the MySQL utf8 character set has only supported the first character plane of UTF-8. With MySQL 5.5.3, it now supports the entire character plane, using the utf8mb4 character set. This character set is 100% backwards compatible, and does not require more space than utf8 for characters that fall within the utf8 set, only using an extra byte for characters outside of the utf8 set.
References:
http://en.wikipedia.org/wiki/Mapping_of_Unicode_character_planes
http://dev.mysql.com/doc/refman/5.5/en/charset-unicode-utf8mb4.html
Attachments (3)
Change History (18)
If it's 100% compatible, it sounds like something that could be implemented directly in $wpdb - but that makes it more complicated for existing installations (As you'd need it prior to install?)
We can't just add it to the config file either, as we support db.php dropins for other database setups (Such as hyperdb) which may or may not, use mysql_*() functions.
$wpdb would be fine for new installs, it may also be an option for existing installs, too.
According to the MySQL upgrade notes, feeding 4 byte characters into a utf8 column is not allowed. My testing showed, however, that it was just replaced with a '?' (the same as it always has) regardless of the character set of the input string.
So, if DB_CHARSET is set to utf8 and the MySQL version >= 5.5.3, wpdb::ini_charset() could just force it to utf8mb4.
For new installs, the same thing in wp-admin/includes/schema.php.
New patch implements this functionality in $wpdb instead. It's in wpdb::set_charset() instead of wpdb::init_charset() because mysql_get_server_info() needs to happen after the connection is opened.
This shouldn't affect existing installs - if they have MySQL < 5.5.3, the code won't be run. If they have MySQL >= 5.5.3, my testing shows that feeding a utf8mb4 string into a utf8 column has the same behaviour as it always did - replacing 4 byte characters with a placeholder.
comment:10
ryan — 9 months ago
Our unique key on slug in the terms table fails miserably with utf8mb4.
WordPress database error Specified key was too long; max key length is 767 bytes for query CREATE TABLE wp_terms ( term_id bigint(20) unsigned NOT NULL auto_increment, name varchar(200) NOT NULL default '', slug varchar(200) NOT NULL default '', term_group bigint(10) NOT NULL default 0, PRIMARY KEY (term_id), UNIQUE KEY slug (slug), KEY name (name) ) DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci made by wp_install, make_db_current_silent, dbDelta
comment:11
ryan — 9 months ago
comment:12
pento — 9 months ago
The 767 byte limit won't be increased any time soon, due to http://bugs.mysql.com/bug.php?id=32915 .
I'll keep an eye on the MySQL bug, we can come back to this for a future version.
comment:13
SergeyBiryukov — 9 months ago
- Milestone 3.5 deleted
- Resolution set to maybelater
- Status changed from new to closed
comment:14
pento — 7 months ago
- Keywords has-patch commit removed
- Resolution maybelater deleted
- Status changed from closed to reopened
It turns out this is possible, thanks to the innodb_large_prefix option, introduced in 5.5.14:
http://dev.mysql.com/doc/refman/5.5/en/innodb-parameters.html#sysvar_innodb_large_prefix
So, the requirements are:
MyISAM tables:
- MySQL >= 5.5.3 (Assuming we don't add any indexes larger than 250 characters.)
InnoDB tables:
- MySQL >= 5.5.14
- innodb_file_format=barracuda
- innodb_file_per_table=true
- innodb_large_prefix=true
- All tables with ROW_FORMAT=(DYNAMIC|COMPRESSED)
Any other table formats:
- *head explode*
We can certainly detect these settings, it's just a question of whether that kind of complex (and edge-case-y) test should be in core.
comment:15
SergeyBiryukov — 7 months ago
- Milestone set to Awaiting Review

First patch is closer to a proof of concept - adding the switch to wp-config-sample.php is fairly ugly.