Make WordPress Core

Opened 2 years ago

Last modified 2 months ago

#60002 new defect (bug)

maybe_convert_table_to_utf8mb4() not work with new version of MySQL/MariaDB.

Reported by: okvee's profile okvee Owned by:
Milestone: Awaiting Review Priority: normal
Severity: normal Version:
Component: General Keywords:
Focuses: Cc:

Description

From [this document](https://mariadb.com/kb/en/unicode/). They said:
From MariaDB 10.6, utf8 is by default an alias for utf8mb3

And from [this document](https://dev.mysql.com/doc/refman/8.0/en/charset-unicode-utf8mb3.html) in MySQL. They said:
Historically, MySQL has used utf8 as an alias for utf8mb3; beginning with MySQL 8.0.28, utf8mb3 is used exclusively in the output of SHOW statements and in Information Schema tables when this character set is meant.

It can be refer to [this issue](https://core.trac.wordpress.org/changeset/53918).

So, from this simple MySQL code:

CREATE TABLE testing (
    id bigint(20) NOT NULL AUTO_INCREMENT,
    name varchar(50) DEFAULT NULL,
    PRIMARY KEY (id)
) DEFAULT CHARSET=utf8 AUTO_INCREMENT=1

If you run on MariaDB 10.6 or newer (for me it is 10.6.7) the utf8 column will be automatically changed to utf8mb3_general_ci.

But, WordPress function maybe_convert_table_to_utf8mb4() contain this condition:

<?php
if ( 'utf8' !== $charset && 'utf8mb4' !== $charset ) {
    // Don't upgrade tables that have non-utf8 columns.
    return false;
}

Which will be result in no change from utf8mb3_general_ci to utf8mb4_unicode_ci.

Change History (4)

#1 @ayeshrajans
2 years ago

Tested with MySQL 8.2, and SHOW FULL COLUMNS FROM $table indeed returns utf8mb3_... as the collation name. So I think you are right, that we need to change that line to something like this:

if ( 'utf8' !== $charset && 'utf8mb3' !== $charset && 'utf8mb4' !== $charset ) {

#2 @okvee
14 months ago

12 months ago..
hello there.

#3 @okvee
3 months ago

This cause problems when plugins are trying to create/alter their tables and cause wrong collation. Please fix.

#4 @okvee
2 months ago

Found another possible bug.

File: wp-includes/class-wpdb.php
Line: 886
Method: determine_charset()

if ( 'utf8' === $charset ) {
    $charset = 'utf8mb4';
}

Should be change to

if ( 'utf8' === $charset || 'utf8mb3' === $charset ) {
    $charset = 'utf8mb4';
}

From same reason.
Historically, MySQL has used utf8 as an alias for utf8mb3; beginning with MySQL 8.0.28, utf8mb3 is used exclusively in the output of SHOW statements and in Information Schema tables when this character set is meant.


My hot fix function that improved to always use better collation (mentioned by WordPress wpdb.php file).

function rundizcomhotfix_maybe_convert_table_to_utf8mb4( $table ) {
    global $wpdb;

    $results = $wpdb->get_results("SHOW FULL COLUMNS FROM `$table`");
    if (!$results) {
        return false;
    }

    foreach ($results as $column) {
        if ($column->Collation) {
            list( $charset ) = explode('_', $column->Collation);
            $charset = strtolower($charset);
            if ('utf8' !== $charset && 'utf8mb3' !== $charset && 'utf8mb4' !== $charset) {
                // Don't upgrade tables that have non-utf8 columns.
                return false;
            }
        }
    }

    $table_details = $wpdb->get_row("SHOW TABLE STATUS LIKE '$table'");
    if (!$table_details) {
        return false;
    }

    list( $table_charset ) = explode('_', $table_details->Collation);
    $table_charset = strtolower($table_charset);
    if ('utf8mb4' === $table_charset) {
        return true;
    }

    // the code above has been copied from original function.

    // custom code that upgrade to best collate. ---------------------------------
    $table_charset = 'utf8mb4';
    $collate = 'utf8mb4_unicode_ci';
    $charset_collate = $wpdb->determine_charset($table_charset, $collate);
    $table_charset = $charset_collate['charset'];
    $collate = $charset_collate['collate'];
    unset($charset_collate);
    // end custom code that upgrade to best collate. ---------------------------------

    // phpcs:ignore WordPress.DB.DirectDatabaseQuery.SchemaChange
    return $wpdb->query("ALTER TABLE $table CONVERT TO CHARACTER SET $table_charset COLLATE $collate");
}
Last edited 2 months ago by okvee (previous) (diff)
Note: See TracTickets for help on using tickets.