Make WordPress Core

Opened 3 years ago

Closed 19 months ago

Last modified 19 months ago

#53623 closed defect (bug) (fixed)

MariaDB 10.6 renamed utf8 to utf8mb3

Reported by: skithund's profile skithund Owned by: sergeybiryukov's profile SergeyBiryukov
Milestone: 6.1 Priority: normal
Severity: normal Version:
Component: Database Keywords: has-patch
Focuses: Cc:

Description

See MariaDB ticket MDEV-8334 "Rename utf8 to utf8mb3"

Which results in charset tests are now failing.

Attachments (1)

53623.diff (9.4 KB) - added by SergeyBiryukov 19 months ago.

Download all attachments as: .zip

Change History (16)

#1 @ayeshrajans
3 years ago

Very nice find :)
I suppose we can conditionally assertSame by checking the db version and the name.

From what I see, self::$server_info contains MariaDB, and self::$_wpdb->db_version() returns the version. I don't have a MariaDB test setup at the moment, and will try to put forth a patch this weekend. Just wanted to share my 2 cents in the meantime.

This ticket was mentioned in Slack in #forums by yui. View the logs.


3 years ago

This ticket was mentioned in Slack in #hosting-community by yui. View the logs.


3 years ago

#4 @ayeshrajans
3 years ago

MySQL 8.0.26 also has related changes: https://dev.mysql.com/doc/relnotes/mysql/8.0/en/news-8-0-26.html


These statements now report utf8mb3 rather than utf8 when writing character set names: EXPLAIN, SHOW CREATE PROCEDURE, SHOW CREATE EVENT.

Stored program definitions retrieved from the data dictionary now report utf8mb3 rather than utf8 in character set references. This affects any output produced from those definitions, such as SHOW CREATE statements.

This error message now reports utf8mb3 rather than utf8 when writing character set names: ER_INVALID_CHARACTER_STRING. (Bug #32233614, Bug #32392077, Bug #32392209, Bug #32428538, Bug #32428598)

#5 @desrosj
2 years ago

  • Milestone changed from Awaiting Review to Future Release

#6 follow-up: @JavierCasares
2 years ago

Tested with:

  • PHP 7.1 -> 8.1
  • MariaDB 10.6

This makes some test errors:

  • Tests_DB_Charset::test_set_charset_changes_the_connection_collation
    Failed asserting that two strings are identical.
    --- Expected
    +++ Actual
    @@ @@
    -'utf8_general_ci'
    +'utf8mb3_general_ci'
    
  • Tests_DB_Charset::test_get_column_charset::test_get_column_charset with data set #5
    Failed asserting that two strings are identical.
    --- Expected
    +++ Actual
    @@ @@
    -'utf8'
    +'utf8mb3'
    
  • Tests_DB_Charset::test_get_column_charset::test_get_column_charset with data set #6
    Failed asserting that two strings are identical.
    --- Expected
    +++ Actual
    @@ @@
    -'utf8'
    +'utf8mb3'
    
  • Tests_DB_Charset::test_table_collation_check::test_table_collation_check with data set #0
    ('CREATE TABLE table_collation_..._bin )', true, 'SELECT * FROM table_collation... a='😈'', 'DROP TABLE IF EXISTS table_co...heck_0', array('SELECT * FROM table_collation...='foo'', 'SHOW FULL TABLES LIKE table_c...heck_0', 'DESCRIBE table_collation_check_0', 'DESC table_collation_check_0', 'EXPLAIN SELECT * FROM table_c...heck_0'))
    Failed asserting that false is identical to true.
    
  • Tests_DB_Charset::test_table_collation_check::test_table_collation_check with data set #1
     ('CREATE TABLE table_collation_...l_ci )', true, 'SELECT * FROM table_collation... a='😈'', 'DROP TABLE IF EXISTS table_co...heck_1', array('SELECT * FROM table_collation...='foo'', 'SHOW FULL TABLES LIKE table_c...heck_1', 'DESCRIBE table_collation_check_1', 'DESC table_collation_check_1', 'EXPLAIN SELECT * FROM table_c...heck_1'))
    Failed asserting that false is identical to true.
    
  • Tests_DB_Charset::test_table_collation_check::test_table_collation_check with data set #4
    ('CREATE TABLE table_collation_... INT )', true, 'SELECT * FROM table_collation... a='😈'', 'DROP TABLE IF EXISTS table_co...heck_4', array('SELECT * FROM table_collation...='foo'', 'SHOW FULL TABLES LIKE table_c...heck_4', 'DESCRIBE table_collation_check_4', 'DESC table_collation_check_4', 'EXPLAIN SELECT * FROM table_c...heck_4'))
    Failed asserting that false is identical to true.
    

Also... should WordPress set the default charset to "utf8mb4"?

#7 in reply to: ↑ 6 @SergeyBiryukov
2 years ago

Replying to JavierCasares:

Also... should WordPress set the default charset to "utf8mb4"?

I believe that would be best discussed in #48285. Related: #45697.

#8 follow-up: @JavierCasares
2 years ago

Checking MySQL and MariaDB versions, all supported versions have support for utf8mb3, so we should update "utf8" for "utf8mb3" by default and do some testing.

This ticket was mentioned in Slack in #hosting-community by skithund. View the logs.


2 years ago

This ticket was mentioned in Slack in #hosting-community by javier. View the logs.


2 years ago

#11 in reply to: ↑ 8 @SergeyBiryukov
19 months ago

  • Keywords has-patch added; needs-patch removed
  • Milestone changed from Future Release to 6.1

Replying to ayeshrajans:

MySQL 8.0.26 also has related changes: https://dev.mysql.com/doc/relnotes/mysql/8.0/en/news-8-0-26.html


These statements now report utf8mb3 rather than utf8 when writing character set names: EXPLAIN, SHOW CREATE PROCEDURE, SHOW CREATE EVENT.

Stored program definitions retrieved from the data dictionary now report utf8mb3 rather than utf8 in character set references. This affects any output produced from those definitions, such as SHOW CREATE statements.

This error message now reports utf8mb3 rather than utf8 when writing character set names: ER_INVALID_CHARACTER_STRING.

Thanks! These changes are indeed related, but they don't appear to cause the test failures here.

In my testing, the current tests still pass on MySQL up until version 8.0.29, which is no longer available for download, but has some more character set support changes. The tests start failing on MySQL 8.0.30, with the same six failures as listed in comment:6.

From MySQL 8.0.30 release notes:

Important Change: A previous change renamed character sets having deprecated names prefixed with utf8_ to use utf8mb3_ instead. In this release, we rename the utf8_ collations as well, using the utf8mb3_ prefix; this is to make the collation names consistent with those of the character sets, not to rely any longer on the deprecated collation names, and to clarify the distinction between utf8mb3 and utf8mb4. The names using the utf8mb3_ prefix are now used exclusively for these collations in the output of SHOW statements such as SHOW CREATE TABLE, as well as in the values displayed in the columns of Information Schema tables including the COLLATIONS and COLUMNS tables.

Replying to JavierCasares:

Checking MySQL and MariaDB versions, all supported versions have support for utf8mb3, so we should update "utf8" for "utf8mb3" by default and do some testing.

It is worth noting that WordPress does automatically upgrade to utf8mb4 when possible, see comment:1:ticket:48285.

Reading the MariaDB ticket MDEV-8334 Rename utf8 to utf8mb3:

In long terms we want the name utf8 mean the full featured UTF-8.
We'll do a few preparatory steps:

  1. Change the main name of the 3-byte character set from utf8 to utf8mb3 and make utf8 alias for utf8mb3. This will change all SHOW and INFORMATION_SCHEMA output to display utf8mb3 instead of utf8, as well as change mysqldump to dump utf8mb3 instead of just utf8.
  2. Add a new server option, say --utf8-is-utf8mb3, which will be true by default, but the DBA will be able to change it to false and thus make utf8 mean utf8mb4.
  3. A few releases later we'll change --utf8-is-utf8mb3 to be false by default.

Or

  1. Do not add any new server options and
  2. Add a new old_mode value for reverting utf8 to utf8mb3 when the default will mean utf8mb4.

The latter appears to be implemented in MariaDB 10.6.1.

Also reading the MySQL note on The utf8mb3 Character Set (3-Byte UTF-8 Unicode Encoding):

Historically, MySQL has used utf8 as an alias for utf8mb3; beginning with MySQL 8.0.28, utf8mb3 is used exclusively in the output of SHOW statements and in Information Schema tables when this character set is meant.

At some point in the future utf8 is expected to become a reference to utf8mb4. To avoid ambiguity about the meaning of utf8, consider specifying utf8mb4 explicitly for character set references instead of utf8.

You should also be aware that the utf8mb3 character set is deprecated and you should expect it to be removed in a future MySQL release. Please use utf8mb4 instead.

If the long-term goal of both projects is to make utf8 an alias for utf8mb4 as mentioned above, it seems like utf8mb3 is an intermediate step, and there is no need for WordPress to use that as the default charset at this time, since it already uses utf8mb4 when possible.

I believe the only changes required here would be:

  • Adding utf8mb3_bin and utf8mb3_general_ci to the list of safe collations recognized by wpdb::check_safe_collation(). This would be the only change for WordPress core.
  • Adding some conditional version checking for the expected test results as suggested in comment:1. This would only affect the unit tests.

See 53623.diff. Tested on:

  • MariaDB 10.6.8
  • MySQL 8.0.25
  • MySQL 8.0.27
  • MySQL 8.0.28
  • MySQL 8.0.29
  • MySQL 8.0.30

#12 @SergeyBiryukov
19 months ago

  • Owner set to SergeyBiryukov
  • Resolution set to fixed
  • Status changed from new to closed

In 53918:

Database: Account for utf8 being renamed to utf8mb3 in newer MariaDB and MySQL versions.

From MariaDB 10.6.1 release notes:

The utf8 character set (and related collations) is now by default an alias for utf8mb3 rather than the other way around. It can be set to imply utf8mb4 by changing the value of the old_mode system variable (MDEV-8334).

From MySQL 8.0.30 release notes:

Important Change: A previous change renamed character sets having deprecated names prefixed with utf8_ to use utf8mb3_ instead. In this release, we rename the utf8_ collations as well, using the utf8mb3_ prefix; this is to make the collation names consistent with those of the character sets, not to rely any longer on the deprecated collation names, and to clarify the distinction between utf8mb3 and utf8mb4. The names using the utf8mb3_ prefix are now used exclusively for these collations in the output of SHOW statements such as SHOW CREATE TABLE, as well as in the values displayed in the columns of Information Schema tables including the COLLATIONS and COLUMNS tables.

This commit adds utf8mb3_bin and utf8mb3_general_ci to the list of safe collations recognized by wpdb::check_safe_collation(). The full list is now as follows:

  • utf8_bin
  • utf8_general_ci
  • utf8mb3_bin
  • utf8mb3_general_ci
  • utf8mb4_bin
  • utf8mb4_general_ci

The change is covered by existing database charset unit tests: six tests which previously failed on MariaDB 10.6.1+ or MySQL 8.0.30+ now pass.

Includes:

  • Adjusting the expected test results based on MariaDB and MySQL version.
  • Using named data providers for the affected tests to make test output more descriptive.
  • Adding a failure message to each assertion when multiple assertions are used in the test.

References:

Follow-up to [30345], [32162], [37320].

Props skithund, ayeshrajans, JavierCasares, SergeyBiryukov.
Fixes #53623.

#13 @SergeyBiryukov
19 months ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

The tests now pass on PHP 8.0.x + MariaDB 10.6.1+, but still fail on PHP 7.4.x + MariaDB 10.6.1+.

I forgot about MariaDB version being reported differently between PHP versions, see comment:33:ticket:49364:

  • PHP 8.0.21: 10.6.8-MariaDB
  • PHP 7.4.30: 5.5.5-10.6.5-MariaDB

Reopening to correct the version check for setting the $utf8_is_utf8mb3 flag.

Version 0, edited 19 months ago by SergeyBiryukov (next)

#14 @SergeyBiryukov
19 months ago

  • Resolution set to fixed
  • Status changed from reopened to closed

In 53919:

Tests: Correct MariaDB version check in database charset tests.

MariaDB version is reported differently between PHP versions:

  • PHP 8.0.16 or later: 10.6.8-MariaDB
  • PHP 8.0.15 or earlier: 5.5.5-10.6.8-MariaDB

The latter includes PHP 7.4.x and PHP 5.6.x as well, where the version is also reported with the 5.5.5- prefix.

This commit makes an adjustment to the Tests_DB_Charset class to check for the correct version.

References:

Follow-up to [53918].

Fixes #53623.

This ticket was mentioned in Slack in #core by sergey. View the logs.


19 months ago

Note: See TracTickets for help on using tickets.