WordPress.org

Make WordPress Core

#37982 closed defect (bug) (fixed)

4.6.1 Breaks apostrophes in titles and utf-8 characters

Reported by: edutz Owned by: pento
Milestone: 4.7.1 Priority: high
Severity: major Version: 4.6.1
Component: Charset Keywords: has-patch fixed-major
Focuses: Cc:

Description

I've just updated to 4.6.1 and noticed that in my article list, there were some "unnamed" posts. I've checked them on frontpage but they were there(working and full text), so i opened one unnamed in editor and no name was given and no description. Furthermore, it seems that for every apostrophes it replaces with this char: � and the same for utf-8 characters (å â ...). Reverting to 4.6.0 does absolutely nothing, somehow, something is messed up in database.
PS: i did removed plugins and custom themes just to be sure.
PS2: o do have utf8mb4 charset in config. as well as "ini_set( 'default_charset', 'UTF-8' );"

Attachments (4)

21c7009541ea46c990be93bf590a66e5.png (13.5 KB) - added by edutz 10 months ago.
44604998986b4402898774e288c113ce.png (11.0 KB) - added by edutz 10 months ago.
37982.diff (573 bytes) - added by pento 10 months ago.
37982.2.diff (1.3 KB) - added by pento 10 months ago.

Download all attachments as: .zip

Change History (18)

#1 @Clorith
10 months ago

Hi there, and welcome to Trac.

Would you be able to provide some additional information that we will need to look into this matter?

What we are looking for is your PHP and MySQL versions, the values of DB_CHARSET and DB_COLLATE, as well as the output from running the SQL query SHOW CREATE TABLE wp_posts.

#2 @edutz
10 months ago

Sure.
PHP Version 5.3.3
Server version: 5.5.51 - MySQL Community Server
define('DB_CHARSET', 'utf8mb4');
define('DB_COLLATE', );

SHOW CREATE TABLE se_posts:
CREATE TABLE se_posts (

ID bigint(20) unsigned NOT NULL AUTO_INCREMENT,
post_author bigint(20) unsigned NOT NULL DEFAULT '0',
post_date datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
post_date_gmt datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
post_content longtext COLLATE utf8mb4_unicode_ci NOT NULL,
post_title text COLLATE utf8mb4_unicode_ci NOT NULL,
post_excerpt text COLLATE utf8mb4_unicode_ci NOT NULL,
post_status varchar(20) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT 'publish',
comment_status varchar(20) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT 'open',
ping_status varchar(20) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT 'open',
post_password varchar(20) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT ,
post_name varchar(200) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT
,
to_ping text COLLATE utf8mb4_unicode_ci NOT NULL,
pinged text COLLATE utf8mb4_unicode_ci NOT NULL,
post_modified datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
post_modified_gmt datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
post_content_filtered longtext COLLATE utf8mb4_unicode_ci NOT NULL,
post_parent bigint(20) unsigned NOT NULL DEFAULT '0',
guid varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT ,
menu_order int(11) NOT NULL DEFAULT '0',
post_type varchar(20) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT 'post',
post_mime_type varchar(100) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT
,
comment_count bigint(20) NOT NULL DEFAULT '0',
PRIMARY KEY (ID),
KEY post_name (post_name(191)),
KEY type_status_date (post_type,post_status,post_date,ID),
KEY post_parent (post_parent),
KEY post_author (post_author)

) ENGINE=MyISAM AUTO_INCREMENT=2794888 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

#3 @glafarge
10 months ago

Same as edutz.
After the automatic update 4.6.1, posts appear to be damaged.

  • Display issues : utf8 tables but accents displayed incorrectly.
  • Some posts titles are missing.

It seems that the problem of missing titles occurs only in back-end.
Titles are correctly displayed in front-end (but with character encoding problem).

Furthermore, ACF plugin's extra fields are not be displayed in post-edit page.

Maybe the diff will help to understand quickly what happened.

#4 @glafarge
10 months ago

Reverting to the previous version of wp-db.php file resolves temporary the issue.

Problem came from inside set_charset() function.

As in Wordpress v4.6.1 the MySQL character set queries now depends on the mysqli_set_charset() success,
SET NAMES %s will be never executed if it return false.

In my case mysqli_set_charset($dbh, 'utf8mb4') return false because I think this function don't accepts unicode subsets (like utf8mb4 that I use). See MySQL character sets.

set_charset() function must deal with utf8xxx subsets to work properly.

For example the follwing can resolve the issue :

$mysqli_charset = (strpos($charset, 'utf8')!==false) ? 'utf8' : $charset;
$set_charset_succeeded = mysqli_set_charset( $dbh, $mysqli_charset) ;

Maybe we must apply the same logic to the old MySQL extension part of the function.

Tested with MySQL server v5.6.

@pento
10 months ago

#5 @pento
10 months ago

  • Keywords has-patch needs-testing needs-unit-tests added
  • Milestone changed from Awaiting Review to 4.6.2
  • Priority changed from normal to high
  • Severity changed from normal to major

Thank you for the feedback, everyone!

As @glafarge noticed, it's trying to connect with the utf8mb4 charset, when something doesn't support utf8mb4 correctly, likely the client library.

Could I get you to test 37982.diff, and see if that fixes the issue?

#6 @pento
10 months ago

  • Owner set to pento
  • Status changed from new to assigned

#7 @edutz
10 months ago

  • Resolution set to worksforme
  • Status changed from assigned to closed

Can confirm that does the trick, though some of my settings reseted for some reason (custom settings).

Thanks for quick reply.

Last edited 10 months ago by edutz (previous) (diff)

#8 @pento
10 months ago

  • Resolution worksforme deleted
  • Status changed from closed to reopened

Thanks for confirming that it works, but there's no need to close the ticket. :-)

@pento
10 months ago

#9 @pento
10 months ago

  • Keywords needs-unit-tests removed

37982.2.diff adds unit tests.

For anyone looking at this ticket not comfortable making changes to WordPress Core, a workaround is to set your DB_CHARSET to utf8 instead of utf8mb4, and if your DB_COLLATE starts with utf8mb4_, change that to utf8_.

Please still test that the patch works for you, though. I'd like some more checks before I commit it.

Last edited 10 months ago by pento (previous) (diff)

#10 @glafarge
10 months ago

@pento : in my case your patch is working like a charm. Thank you :)

This ticket was mentioned in Slack in #forums by clorith. View the logs.


10 months ago

#12 @pento
10 months ago

  • Resolution set to fixed
  • Status changed from reopened to closed

In 38580:

Database: Fall back to utf8 when utf8mb4 isn't supported.

Sometimes, DB_CHARSET will be set to utf8mb4, even if the current setup doesn't support utf8mb4. After [38442], this can cause significant character set failures, causing the connection to fall back to latin1.

Instead of doing this, we now check that the connection supports utf8mb4 before trying to use it, and fall back to utf8 when we need to.

Fixes #37982 for trunk.

#13 @pento
10 months ago

  • Keywords fixed-major added; needs-testing removed
  • Resolution fixed deleted
  • Status changed from closed to reopened

#14 @pento
10 months ago

  • Resolution set to fixed
  • Status changed from reopened to closed

In 38581:

Database: Fall back to utf8 when utf8mb4 isn't supported.

Sometimes, DB_CHARSET will be set to utf8mb4, even if the current setup doesn't support utf8mb4. After [38442], this can cause significant character set failures, causing the connection to fall back to latin1.

Instead of doing this, we now check that the connection supports utf8mb4 before trying to use it, and fall back to utf8 when we need to.

Merge of [38580] to the 4.6 branch.
Fixes #37982.

Note: See TracTickets for help on using tickets.