Make WordPress Core

Opened 10 years ago

Closed 10 years ago

Last modified 10 years ago

#32051 closed defect (bug) (fixed)

Creating new posts/terms with non-English characters is broken for tables with cp1251 collation in 4.1.2

Reported by: vloo's profile vloo Owned by: pento's profile pento
Milestone: 4.1.3 Priority: high
Severity: critical Version: 4.1.2
Component: Database Keywords: has-patch fixed-major
Focuses: Cc:

Description

This is a pretty specific case of an client of mine:

Site is pretty old (got posts since 2005) and it's in Bulgarian. Most tables' collation is cp1251, but there are newer tables (from plugins) that are utf8. DB_CHARSET is utf8 and I guess it was always like that. Since the automatic update of 4.1.2 the users were unable to publish or save as draft posts in Cyrillic, nor create new terms in Cyrillic. Comments couldn't be created either.

I still haven't tested this in a clean install, as I just found out about the issue, but it's not just this site:

https://wordpress.org/support/topic/update-412-problem?replies=9

Replacing wp-includes/wp-db.php with it's previous version seems to be fixing the problem, but I'm totally aware that this is bad hotfix, that shouldn't be a long term solution.

My wild guess is, that something is missing or faulty here:

https://core.trac.wordpress.org/changeset?old_path=%2Fbranches%2F4.1&old=32234&new_path=%2Fbranches%2F4.1&new=32234&sfp_email=&sfph_mail=

Attachments (4)

32051.diff (1.4 KB) - added by pento 10 years ago.
32051.2.diff (2.4 KB) - added by pento 10 years ago.
32051.3.diff (2.4 KB) - added by pento 10 years ago.
32051.4.diff (1.6 KB) - added by pento 10 years ago.

Download all attachments as: .zip

Change History (38)

#1 @LewisCowles
10 years ago

  • Resolution set to invalid
  • Status changed from new to closed

Can you not just change the collation of all fields, tables and the DB to UTF8? This sounds like a non-WP error... Alternatively could you export all posts, terms etc and reload WP with utf-8?

Also please read
https://codex.wordpress.org/Converting_Database_Character_Sets

Last edited 10 years ago by LewisCowles (previous) (diff)

#2 @SergeyBiryukov
10 years ago

  • Resolution invalid deleted
  • Status changed from closed to reopened
  • Version changed from trunk to 4.1.2

Seeing some similar reports on support forums after the update (posts not saving; wp_insert_post() returns 0; "Submit for Review" button for an admin instead of "Publish", which is usually a symptom of a database crash).

For example: https://wordpress.org/support/topic/help-asap-pls.

This ticket was mentioned in Slack in #forums by netweb. View the logs.


10 years ago

@pento
10 years ago

#4 @pento
10 years ago

  • Keywords has-patch added
  • Milestone changed from Awaiting Review to 4.2
  • Priority changed from normal to high
  • Severity changed from normal to critical

32051.diff should fix this. @vloo, would you mind applying this patch and seeing if it works in your environment?

#5 @vloo
10 years ago

Patch works great on my client's environment. None of the issues I reported can be reproduced after I applied the patch on 4.1.2.

Thanks, @pento!

#6 @LewisCowles
10 years ago

Sorry all, but the patch, working or not is a band-aid for people not using utf8. Why is the solution not to upgrade the database to utf-8 then streamline the core?

#7 @pento
10 years ago

  • Owner set to pento
  • Status changed from reopened to accepted

The patch isn't a bandaid, it's fixing a legitimate bug in the code, which is supposed to be able to deal with this situation.

While I agree that utf8 is the best possible solution (and the vast majority of WordPress sites use utf8), we won't be forcing people to use it by breaking backwards compatibility. There are many sites built years ago, before MySQL's Unicode support was good enough for their use. Forcing them to convert all of their content in order to upgrade isn't an option.

#8 @LewisCowles
10 years ago

I agree the patch is reported as working for some users, but their problem would also be solved by them moving to UTF-8, for which there is guidance in the codex as linked.

"we won't be forcing people to use it by breaking backwards compatibility"

Two things, who represents we, who else did you speak to about this? And is this a "forever" decision? I Hope not because it would represent less time for WP devs to just abandon those and focus on more relevant technologies and issues.

"Forcing them to convert all of their content in order to upgrade isn't an option."

Who is this not an option for? The content does not need to be manually re-typed, a character-code conversion script can be used, such as the ones in the codex I linked.

https://www.owasp.org/index.php/PHP_Security_Cheat_Sheet#Use_UTF-8_unless_necessary

http://php.net/eol.php

5.3 is in EOL from PHP, most OS's with 5.4 and up have a modern mysql of 5.5 and up. In order to upgrade these blogs I don't see how it could be considered particularly mean, or difficult for site-owners to upgrade and migrate their database, particularly if it benefits WordPress, their security, and due to the removal of code, should improve speed...

Last edited 10 years ago by LewisCowles (previous) (diff)

#9 @SergeyBiryukov
10 years ago

Getting a lot of reports about this on ru.forums.wordpress.org, and I have a feeling there'll be a lot more as background updates keep rolling out. Apparently quite a few sites still use cp1251 or other non-utf8 collations.

Any chance to release 4.1.3 (4.0.3, etc.) with this fix sooner rather than later?

#10 @vloo
10 years ago

I guess @LewisCowles is right that we shouldn't keep supporting all kinds of weird legacy collations in the long run. Still this is a mess that WordPress did or let people do to themselves "long time ago, in a galaxy far, far away, where people are not English speaking". Best solution would be to have a "band-aid" until there is a database migration script that converts people's databases to utf8mb4.

Either this, or a screaming warning notification at the upgrade screen, telling the administrator that their site will be in trouble because of their collation, that is being left unsupported in the next major release.

Support over legacy technology should never be dropped silently during a security release, letting people save themselves (if they can) after they find out their sites are actually safe but not functioning properly.

#11 @LewisCowles
10 years ago

Awesome, I think this is very sensible

This ticket was mentioned in Slack in #core by pento. View the logs.


10 years ago

@pento
10 years ago

#13 @pento
10 years ago

32051.2.diff improves performance by setting several extra character sets (including cp1251) to be tested locally against PHP's MB string libraries, instead of being sent to MySQL.

The were compiled by comparing these two lists:

https://dev.mysql.com/doc/refman/5.5/en/charset-charsets.html
https://php.net/manual/en/mbstring.supported-encodings.php

#14 @nbachiyski
10 years ago

For me, array_search( 'CP1251', mb_list_encodings() ); returns false (PHP 5.4.39). The canonical mb_ encoding name is Windows-1251. Even though mb_convert_encoding( 'baba', 'CP1251', 'CP1251' ); works properly, our support-encoding checks will be bypassed, so we should use the canonical name.

@pento
10 years ago

#15 @pento
10 years ago

32051.3.diff swaps CP1251 for Windows-1251.

This ticket was mentioned in Slack in #core by pento. View the logs.


10 years ago

@pento
10 years ago

#17 @pento
10 years ago

32051.4.diff is what we'll commit to trunk and backport to older branches. Adding more encodings to $mb_string can be a thing for 4.3.

#18 @pento
10 years ago

In 32261:

WPDB: When sanity checking a string by sending it to MySQL for conversion checks, the incorrect data structure was being returned from wpdb::strip_invalid_text(), causing all write queries to fail for some character sets when the query contained non-ASCII characters.

See #32051.

#19 @pento
10 years ago

In 32262:

WPDB: When sanity checking a string by sending it to MySQL for conversion checks, the incorrect data structure was being returned from wpdb::strip_invalid_text(), causing all write queries to fail for some character sets when the query contained non-ASCII characters.

Merge of [32261] to the 4.1 branch.

See #32051.

#20 @DrewAPicture
10 years ago

  • Keywords fixed-major added
  • Milestone changed from 4.2 to 4.1.3

This ticket was mentioned in Slack in #forums by ipstenu. View the logs.


10 years ago

#22 @pento
10 years ago

In 32272:

WPDB: When sanity checking a string by sending it to MySQL for conversion checks, the incorrect data structure was being returned from wpdb::strip_invalid_text(), causing all write queries to fail for some character sets when the query contained non-ASCII characters.

Merge of [32261] to the 4.0 branch.

See #32051.

#23 @pento
10 years ago

In 32273:

WPDB: When sanity checking a string by sending it to MySQL for conversion checks, the incorrect data structure was being returned from wpdb::strip_invalid_text(), causing all write queries to fail for some character sets when the query contained non-ASCII characters.

Merge of [32261] to the 3.9 branch.

See #32051.

#24 @pento
10 years ago

In 32274:

WPDB: When sanity checking a string by sending it to MySQL for conversion checks, the incorrect data structure was being returned from wpdb::strip_invalid_text(), causing all write queries to fail for some character sets when the query contained non-ASCII characters.

Merge of [32261] to the 3.8 branch.

See #32051.

#25 @pento
10 years ago

In 32275:

WPDB: When sanity checking a string by sending it to MySQL for conversion checks, the incorrect data structure was being returned from wpdb::strip_invalid_text(), causing all write queries to fail for some character sets when the query contained non-ASCII characters.

Merge of [32261] to the 3.7 branch.

See #32051.

#26 @pento
10 years ago

  • Resolution set to fixed
  • Status changed from accepted to closed

Well, that was an interesting experience.

This ticket was mentioned in Slack in #core by pento. View the logs.


10 years ago

This ticket was mentioned in Slack in #forums by vloo. View the logs.


10 years ago

This ticket was mentioned in Slack in #core by sam. View the logs.


10 years ago

This ticket was mentioned in Slack in #core by jeffr0. View the logs.


10 years ago

#31 follow-ups: @ewerkstatt
10 years ago

We have the same problem here (import of XML-data into custom table using wp-query), and it still not working with wp-db.php of 4.1.3.

I had to use the file from 4.1.1 to get the import running.

And BTW: the XML-file ist UTF-8, also the database.

#32 in reply to: ↑ 31 @SergeyBiryukov
10 years ago

Replying to ewerkstatt:

We have the same problem here (import of XML-data into custom table using wp-query), and it still not working with wp-db.php of 4.1.3.

Please create a new ticket, this one's closed on a completed milestone.

#33 in reply to: ↑ 31 @SergeyBiryukov
10 years ago

Replying to ewerkstatt:

We have the same problem here (import of XML-data into custom table using wp-query), and it still not working with wp-db.php of 4.1.3.

Actually, see #32090, seems like the same issue.

This ticket was mentioned in Slack in #core by vloo. View the logs.


10 years ago

Note: See TracTickets for help on using tickets.