Make WordPress Core

Opened 9 years ago

Last modified 6 years ago

#21212 closed task (blessed)

MySQL tables should use utf8mb4 character set — at Version 4

Reported by: pento Owned by:
Milestone: 4.2 Priority: normal
Severity: normal Version: 3.4.1
Component: Database Keywords:
Focuses: Cc:

Description (last modified by SergeyBiryukov)

Historically, the MySQL utf8 character set has only supported the first character plane of UTF-8. With MySQL 5.5.3, it now supports the entire character plane, using the utf8mb4 character set. This character set is 100% backwards compatible, and does not require more space than utf8 for characters that fall within the utf8 set, only using an extra byte for characters outside of the utf8 set.



Change History (6)

9 years ago

#1 @pento
9 years ago

  • Keywords has-patch added

First patch is closer to a proof of concept - adding the switch to wp-config-sample.php is fairly ugly.

#2 @dd32
9 years ago

If it's 100% compatible, it sounds like something that could be implemented directly in $wpdb - but that makes it more complicated for existing installations (As you'd need it prior to install?)

We can't just add it to the config file either, as we support db.php dropins for other database setups (Such as hyperdb) which may or may not, use mysql_*() functions.

#3 @pento
9 years ago

$wpdb would be fine for new installs, it may also be an option for existing installs, too.

According to the MySQL upgrade notes, feeding 4 byte characters into a utf8 column is not allowed. My testing showed, however, that it was just replaced with a '?' (the same as it always has) regardless of the character set of the input string.

So, if DB_CHARSET is set to utf8 and the MySQL version >= 5.5.3, wpdb::ini_charset() could just force it to utf8mb4.

For new installs, the same thing in wp-admin/includes/schema.php.

#4 @SergeyBiryukov
9 years ago

  • Description modified (diff)

Related: #13590

Note: See TracTickets for help on using tickets.