Make WordPress Core

Opened 4 years ago

Last modified 18 months ago

#48285 assigned enhancement

wp-config-sample.php should default to `utf8mb4` instead of `utf8` character set

Reported by: bchecketts's profile bchecketts Owned by:
Milestone: Awaiting Review Priority: normal
Severity: minor Version: 5.3
Component: Database Keywords: has-patch
Focuses: Cc:

Description

MySQL's utf8 character encoding is not a correct implementation of the standard and doesn't work with 4-byte characters, which includes many emoji. utf8mb4 is the corrected implementation.

See https://medium.com/@adamhooper/in-mysql-never-use-utf8-use-utf8mb4-11761243e434 or just google "mysql utf8 vs utf8mb4"

It would seem wise for wp-config-sample.php to default then to utf8mb4 instead of utf8 so that new installations have the improved character set.

Change History (10)

#1 @SergeyBiryukov
4 years ago

  • Component changed from Charset to Database

Previously: #21212, #32105, #32405, #33122.

Thanks for the ticket!

On both new and existing WordPress installs, WordPress will automatically upgrade the tables to utf8mb4 if the server supports that, and when DB_CHARSET is defined as utf8, it will automatically switch to utf8mb4 instead.

wp-config-sample.php still needs to default to utf8 though, as not all sites can support utf8mb4.

Last edited 2 years ago by SergeyBiryukov (previous) (diff)

#2 @SergeyBiryukov
2 years ago

It's also worth noting that wp-admin/setup-config.php does write DB_CHARSET as utf8mb4 instead of utf8 if the server supports that, see [31349] / #21212 and comment:2:ticket:33122.

#3 @JavierCasares
2 years ago

Right now, with the latest MySQL 8.0 and MariaDB 10.6 versions, there is no "utf8" because hey changed it for "utf8mb3".

If we want to support all the inernational language charset, WordPress should support by default "utf8mb4" (supported by all WordPress-SQL supported databases).

This ticket was mentioned in PR #2214 on WordPress/wordpress-develop by bchecketts.


2 years ago
#4

  • Keywords has-patch added

Change DB_CHARSET in wp-config-sample.php from utf8 to utf8mb4

Trac ticket: https://core.trac.wordpress.org/ticket/48285

#5 follow-up: @bchecketts
2 years ago

  • Keywords has-patch removed

Wordpress requirements listed at https://wordpress.org/about/requirements/ indicate that MySQL version 5.7 is required.

The utf8mb4 character set was released in MySQL version 5.5.3 in 2010. (See page 159 of https://downloads.mysql.com/docs/mysql-5.5-relnotes-en.pdf. The MySQL Release Notes on mysql.com no longer to back to v5.5).

Pull Request at https://github.com/WordPress/wordpress-develop/pull/2214

#6 @bchecketts
2 years ago

  • Keywords has-patch added

#7 in reply to: ↑ 5 @SergeyBiryukov
2 years ago

Replying to bchecketts:

WordPress requirements listed at https://wordpress.org/about/requirements/ indicate that MySQL version 5.7 is required.

Please note that MySQL 5.7 or greater is the recommended version, not required. It was updated in [meta11407] after the discussion in comment:11:ticket:41490.

The required versions are mentioned a bit further down the page and have not changed in a while:

Note: If you are in a legacy environment where you only have older PHP or MySQL versions, WordPress also works with PHP 5.6.20+ and MySQL 5.0+, but these versions have reached official End Of Life and as such may expose your site to security vulnerabilities.

#8 follow-up: @JavierCasares
2 years ago

Based on this, yes, new versions of WordPress may have utf8mb4 by default.

This ticket was mentioned in Slack in #hosting-community by javier. View the logs.


2 years ago

#10 in reply to: ↑ 8 @SergeyBiryukov
18 months ago

Replying to JavierCasares:

Right now, with the latest MySQL 8.0 and MariaDB 10.6 versions, there is no "utf8" because hey changed it for "utf8mb3".

Thanks! This should now be addressed in #53623.

If we want to support all the international language charset, WordPress should support by default "utf8mb4" (supported by all WordPress-SQL supported databases).

As noted in comment:1 and comment:2, WordPress does automatically upgrade to utf8mb4 when possible.

Replying to JavierCasares:

Based on this, yes, new versions of WordPress may have utf8mb4 by default.

I might be missing something, but as noted in comment:7, WordPress still has MySQL 5.0 as a minimum requirement at this time, which did not include utf8mb4. So it looks like until the minimum version is bumped to MySQL 5.5, it is neither safe nor required to change the default charset in wp-config-sample.php.

On a related note, reading the MariaDB ticket MDEV-8334 Rename utf8 to utf8mb3:

In long terms we want the name utf8 mean the full featured UTF-8.
We'll do a few preparatory steps:

  1. Change the main name of the 3-byte character set from utf8 to utf8mb3 and make utf8 alias for utf8mb3. This will change all SHOW and INFORMATION_SCHEMA output to display utf8mb3 instead of utf8, as well as change mysqldump to dump utf8mb3 instead of just utf8.
  2. Add a new server option, say --utf8-is-utf8mb3, which will be true by default, but the DBA will be able to change it to false and thus make utf8 mean utf8mb4.
  3. A few releases later we'll change --utf8-is-utf8mb3 to be false by default.

Or

  1. Do not add any new server options and
  2. Add a new old_mode value for reverting utf8 to utf8mb3 when the default will mean utf8mb4.

The latter appears to be implemented in MariaDB 10.6.1.

Also reading the MySQL note on The utf8mb3 Character Set (3-Byte UTF-8 Unicode Encoding):

Historically, MySQL has used utf8 as an alias for utf8mb3; beginning with MySQL 8.0.28, utf8mb3 is used exclusively in the output of SHOW statements and in Information Schema tables when this character set is meant.

At some point in the future utf8 is expected to become a reference to utf8mb4. To avoid ambiguity about the meaning of utf8, consider specifying utf8mb4 explicitly for character set references instead of utf8.

You should also be aware that the utf8mb3 character set is deprecated and you should expect it to be removed in a future MySQL release. Please use utf8mb4 instead.

If the long-term goal of both projects is to make utf8 an alias for utf8mb4 as mentioned above, the default charset in wp-config-sample.php may not technically need any changes at all, though it still might be a good idea to explicitly change it to utf8mb4 when the minimum version is bumped to MySQL 5.5.

Note: See TracTickets for help on using tickets.