Opened 4 weeks ago
#59868 new defect (bug)
Database insert with emoji fails when table has columns with both utf8mb3 (utf8) and utf8mb4 charsets
Reported by: |
|
Owned by: | |
---|---|---|---|
Milestone: | Awaiting Review | Priority: | normal |
Severity: | normal | Version: | trunk |
Component: | Charset | Keywords: | needs-patch needs-unit-tests |
Focuses: | Cc: |
Description
The wpdb::get_table_charset()
function currently sets the charset to utf8
when it detects that both utf8
and utf8mb4
charsets are present in the table's column definitions.
That same function also swaps in utf8
for utf8mb3
as they are effectively the same thing.
This means that the wpdb::strip_invalid_text_from_query()
function used early by the wpdb::query()
function to determine whether text is safe to be inserted, ends up stripping utf8mb4
safe characters because it forces the use of utf8
in the called wpdb::strip_invalid_text()
function.
This results in insert queries failing where a table has columns with both utf8mb3/utf8
and utf8mb4
collations used, and there are emojis or other 4 byte characters being used in the column that has a utf8mb4
charset and collation defined.
I propose that the wpdb::get_table_charset()
function should use utf8mb4
as the returned charset when it detects that 2 charsets are defined on the table, and they are utf8
and utf8mb4
, instead of the current behaviour of returning utf8
.