Make WordPress Core

Opened 8 months ago

Last modified 6 months ago

#59868 new defect (bug)

Database insert with emoji fails when table has columns with both utf8mb3 (utf8) and utf8mb4 charsets

Reported by: ianmjones's profile ianmjones Owned by:
Milestone: Future Release Priority: normal
Severity: normal Version: 4.2
Component: Charset Keywords: needs-patch needs-unit-tests
Focuses: Cc:

Description

The wpdb::get_table_charset() function currently sets the charset to utf8 when it detects that both utf8 and utf8mb4 charsets are present in the table's column definitions.

That same function also swaps in utf8 for utf8mb3 as they are effectively the same thing.

This means that the wpdb::strip_invalid_text_from_query() function used early by the wpdb::query() function to determine whether text is safe to be inserted, ends up stripping utf8mb4 safe characters because it forces the use of utf8 in the called wpdb::strip_invalid_text() function.

This results in insert queries failing where a table has columns with both utf8mb3/utf8 and utf8mb4 collations used, and there are emojis or other 4 byte characters being used in the column that has a utf8mb4 charset and collation defined.

I propose that the wpdb::get_table_charset() function should use utf8mb4 as the returned charset when it detects that 2 charsets are defined on the table, and they are utf8 and utf8mb4, instead of the current behaviour of returning utf8.

Change History (2)

This ticket was mentioned in Slack in #core by jorbin. View the logs.


6 months ago

#2 @jorbin
6 months ago

  • Milestone changed from Awaiting Review to Future Release
  • Version changed from trunk to 4.2

I think the first step for this is an automated test to demonstrate the issue. This might only affect specific versions of MySQL or MariaDB, so it would be good to know if this is an all version bug or a specific version one. Once there is a test and it's known, I think this can get moved into a milestone. Depending on the scope of the fix, it may also need the early tag but I think it's too early for that decision.

Updating the version to 4.2 since that is when utf8mb support was added and from the description, it sounds like something that has been around since then.

Note: See TracTickets for help on using tickets.