WordPress.org

Make WordPress Core

Opened 10 months ago

Last modified 10 months ago

#45697 new enhancement

Database: Upgrading field collations

Reported by: pento Owned by:
Milestone: Future Release Priority: normal
Severity: normal Version:
Component: Database Keywords:
Focuses: Cc:
PR Number:

Description

[37523] added support for the utf8mb4_unicode_520_ci collation, and #38036 exists to look at adding support for the utf8mb4_0900_ai_ci collation.

With these variations, it's possible for a WordPress site to be installed, then MySQL is upgraded sometime later. Subsequent table changes (for example, a plugin creating a new table) would use the newer collation, while the original tables would use the older collation. This will cause JOIN queries between fields of different collations to fail.

Change History (3)

#1 @pento
10 months ago

There are a few options that come to mind for this:

  • Check during the upgrade routine, and upgrade the collations then, if needed. While this makes the upgrade predictable, it means there's a period of time where the collations might be out of sync.
  • Schedule background upgrades when the MySQL change is detected. This avoids problems with collations being out of sync, but could generate long running queries that lock the tables. This may be less of an issue with newer versions of MySQL that support online DDL changes.
  • Add this to Health Check, and require it to be triggered manually.

#2 @knutsp
10 months ago

Will this sort Norwegian/Danish 'æ','ø','å' or Swedish 'æ','ö','å', correctly or will we have to wait for language specific collations?

I have been doing manual collation changes for more than ten years, to get terms and titles sorted correctly.

By only changing meta values, term names and post titles, not slugs, collation mismatch for, some short time, is mostly avoided. Just and idea.

#3 @pento
10 months ago

That's a good thing to take into account, thanks @knutsp.

utf8mb4_0900_ai_ci does accent-insensitive ("ai") sorting, as well as case insensitive ("ci") sorting.

For language-specific sorting, there are language-specific collations (eg, utf8mb4_sv_0900_ai_ci), as well as accent and case sensitive collations (eg, utf8mb4_sv_0900_as_cs). Any upgrade process should take these into account.

Ref: https://dev.mysql.com/doc/refman/8.0/en/charset-mysql.html

Note: See TracTickets for help on using tickets.