Opened 8 years ago
Closed 8 years ago
#37689 closed defect (bug) (fixed)
Issues with utf8mb4 collation and the 4.6 update
Reported by: | Hristo Sg | Owned by: | pento |
---|---|---|---|
Milestone: | 4.6.1 | Priority: | normal |
Severity: | normal | Version: | 4.6 |
Component: | Database | Keywords: | has-patch fixed-major |
Focuses: | performance | Cc: |
Description
If you have a pre-4.6 WP install with charset configured in the wp-config.php file and set to utf8mb4:
define('DB_CHARSET', 'utf8mb4');
After the update, all site symbols including those in the options table are converted into incorrect characters.
If you comment out the line:
#define('DB_CHARSET', 'utf8mb4');
The website starts showing characters correctly.
Attachments (1)
Change History (22)
#3
@
8 years ago
Yes, the nasty part is that I suspect everyone who has defined the charset to be utf8mb4 may see a broken site after the update to 4.6.
If you need more information about the site @Hristo Sg mention I can provide the exact MySQL version, MySQL client version, PHP version, etc.
#4
@
8 years ago
- Keywords reporter-feedback added
@hristo-sg Can you provide some details about your PHP and MySQL (client) versions? What's the current charset/collation of your tables?
#5
@
8 years ago
@ocean90 here is the requested information:
MySQL Server version:
Server version: 5.6.28-76.1-log Percona Server (GPL), Release 76.1, Revision 5759e76
MySQL client version:
mysql --version
mysql Ver 14.14 Distrib 5.6.27-75.0, for Linux (x86_64) using 5.1
PHP Details:
http://pandjarov.com/updatetest/info.php
Table Collation before the upgrade:
utf8mb4_unicode_ci
Table Collation after the upgrade:
utf8mb4_unicode_ci
So the issue is that before the upgrade the site works as expected and after the upgrade all the text was gibberish.
#7
@
8 years ago
I reverted [37601] and the issue was not resolved - the site is still showing gibberish if the define('DB_CHARSET', 'utf8mb4') is not commented. Other than that the DB_COLLATE is indeed empty in the wp-config.php
#9
@
8 years ago
I tried reverting those two as well but the issue remains. @ocean90 if you want I may give you access to a test site which is experiencing this issue or I can revert other changes as well.
#10
@
8 years ago
- Keywords reporter-feedback removed
I'm afraid I'm out of ideas.
@pento any ideas what could cause this?
#11
@
8 years ago
The cause is strange and exciting interactions between character sets. :-)
@danielkanchev: Could you please DM me an Slack? My username there is "pento". I'd like to have a look at your test site.
#13
@
8 years ago
Noting that this ticket may affect the approach on #37683, which is marked for 4.6.1. We need to determine if this should be moved to the 4.6.1 milestone as well.
@pento Do you have any more details?
This ticket was mentioned in Slack in #core by jeremyfelt. View the logs.
8 years ago
#15
@
8 years ago
Hi people,
I had this issue on a website, and the commenting of the DB_CHARSET didn't work, because some plugin stopped working. So I had a look at the database discovering that the issue was on the database (so I assume this depends on the utf8mb4 conversion script). Here is how to fix the database, but please test this on a staging environment, don't do that on a production website.
On http://www.i18nqa.com/debug/utf8-debug.html you can see that if you see characters like ù or á the original charset was latin1, so first create a dump of the database, open a shell console on the server and run:
mysqldump -uUSERNAME -p --default-character-set=latin1 DATABASE_NAME > dump-latin1.sql [enter your password]
Then you have to edit this file in order to make a small correction, but if it's big the editing will require RAM or time:
nano dump-latin1.sql
Change
/*!40101 SET NAMES latin1 */;
to
/*!40101 SET NAMES utf8 */;
save by entering CTRL+X
enter Y
Now your dump is fixed and ready, so I suggest to restore it on another database name, in order to have a backup of the old one and possibly easly restore it, or at least to add a prefix to the existent tables.
Restore it with:
mysql -uUSERNAME -p DATABASE_NAME < dump-latin1.sql [enter your password]
Your WordPress should now work as expected.
This ticket was mentioned in Slack in #core by jeremyfelt. View the logs.
8 years ago
#17
@
8 years ago
- Milestone changed from Awaiting Review to 4.6.1
Thank you @danielkanchev for the use of your server. :-)
The root cause of this problem in @danielkanchev's case was [37320], and PHP 5.3. While the site was on WordPress 4.5, it was using PHP 5.3, which doesn't support utf8mb4
. Because DB_CHARSET
was set to utf8mb4
, wpdb::set_charset()
was silently failing, and reverting back to the default server character set - latin1
.
The upgrade to WordPress 4.6 included [37320], which sets the server side character set, but it assumes that the client side character set has been set correctly. This caused MySQL to be taking latin1
strings from the database, and converting them to utf8
before sending them to PHP. PHP was treating them as latin1
, however, hence the mojibake.
I think we could reasonably check the result of the mysqli_set_charset()
before running the SET NAMES
query, as it's better to try and use the server default character sets for everything if part of the process fails.
#18
@
8 years ago
- Keywords has-patch added
@danielkanchev: Could I get you to test 37689.diff with WordPress 4.6 and PHP 5.3, with DB_CHARSET
set to utf8mb4
?
#19
@
8 years ago
@pento I tested the provided patch and everything works with WP 4.6 + PHP 5.3 and DB_CHARSET set to utf8mb4 on the test site.
#20
@
8 years ago
- Owner set to pento
- Resolution set to fixed
- Status changed from new to closed
In 38441:
IIRC leaving the default at
define('DB_CHARSET', 'utf8');
works best as WordPress will automatically convert toutf8mb4
if possible. But of course that's not "the" solution.