Opened 10 months ago
Last modified 9 months ago
#59868 new defect (bug)
Database insert with emoji fails when table has columns with both utf8mb3 (utf8) and utf8mb4 charsets
Reported by: | ianmjones | Owned by: | |
---|---|---|---|
Milestone: | Future Release | Priority: | normal |
Severity: | normal | Version: | 4.2 |
Component: | Charset | Keywords: | needs-patch needs-unit-tests |
Focuses: | Cc: |
Description
The wpdb::get_table_charset()
function currently sets the charset to utf8
when it detects that both utf8
and utf8mb4
charsets are present in the table's column definitions.
That same function also swaps in utf8
for utf8mb3
as they are effectively the same thing.
This means that the wpdb::strip_invalid_text_from_query()
function used early by the wpdb::query()
function to determine whether text is safe to be inserted, ends up stripping utf8mb4
safe characters because it forces the use of utf8
in the called wpdb::strip_invalid_text()
function.
This results in insert queries failing where a table has columns with both utf8mb3/utf8
and utf8mb4
collations used, and there are emojis or other 4 byte characters being used in the column that has a utf8mb4
charset and collation defined.
I propose that the wpdb::get_table_charset()
function should use utf8mb4
as the returned charset when it detects that 2 charsets are defined on the table, and they are utf8
and utf8mb4
, instead of the current behaviour of returning utf8
.
I think the first step for this is an automated test to demonstrate the issue. This might only affect specific versions of MySQL or MariaDB, so it would be good to know if this is an all version bug or a specific version one. Once there is a test and it's known, I think this can get moved into a milestone. Depending on the scope of the fix, it may also need the
early
tag but I think it's too early for that decision.Updating the version to 4.2 since that is when utf8mb support was added and from the description, it sounds like something that has been around since then.