Make WordPress Core

Opened 4 weeks ago

#59868 new defect (bug)

Database insert with emoji fails when table has columns with both utf8mb3 (utf8) and utf8mb4 charsets

Reported by: ianmjones's profile ianmjones Owned by:
Milestone: Awaiting Review Priority: normal
Severity: normal Version: trunk
Component: Charset Keywords: needs-patch needs-unit-tests
Focuses: Cc:

Description

The wpdb::get_table_charset() function currently sets the charset to utf8 when it detects that both utf8 and utf8mb4 charsets are present in the table's column definitions.

That same function also swaps in utf8 for utf8mb3 as they are effectively the same thing.

This means that the wpdb::strip_invalid_text_from_query() function used early by the wpdb::query() function to determine whether text is safe to be inserted, ends up stripping utf8mb4 safe characters because it forces the use of utf8 in the called wpdb::strip_invalid_text() function.

This results in insert queries failing where a table has columns with both utf8mb3/utf8 and utf8mb4 collations used, and there are emojis or other 4 byte characters being used in the column that has a utf8mb4 charset and collation defined.

I propose that the wpdb::get_table_charset() function should use utf8mb4 as the returned charset when it detects that 2 charsets are defined on the table, and they are utf8 and utf8mb4, instead of the current behaviour of returning utf8.

Change History (0)

Note: See TracTickets for help on using tickets.