Opened 6 years ago
Last modified 2 years ago
#45697 new enhancement
Database: Upgrading field collations
Reported by: | pento | Owned by: | |
---|---|---|---|
Milestone: | Future Release | Priority: | normal |
Severity: | normal | Version: | |
Component: | Database | Keywords: | |
Focuses: | Cc: |
Description
[37523] added support for the utf8mb4_unicode_520_ci
collation, and #38036 exists to look at adding support for the utf8mb4_0900_ai_ci
collation.
With these variations, it's possible for a WordPress site to be installed, then MySQL is upgraded sometime later. Subsequent table changes (for example, a plugin creating a new table) would use the newer collation, while the original tables would use the older collation. This will cause JOIN
queries between fields of different collations to fail.
Change History (6)
#2
@
6 years ago
Will this sort Norwegian/Danish 'æ','ø','å' or Swedish 'æ','ö','å', correctly or will we have to wait for language specific collations?
I have been doing manual collation changes for more than ten years, to get terms and titles sorted correctly.
By only changing meta values, term names and post titles, not slugs, collation mismatch for, some short time, is mostly avoided. Just and idea.
#3
@
6 years ago
That's a good thing to take into account, thanks @knutsp.
utf8mb4_0900_ai_ci
does accent-insensitive ("ai") sorting, as well as case insensitive ("ci") sorting.
For language-specific sorting, there are language-specific collations (eg, utf8mb4_sv_0900_ai_ci
), as well as accent and case sensitive collations (eg, utf8mb4_sv_0900_as_cs
). Any upgrade process should take these into account.
Ref: https://dev.mysql.com/doc/refman/8.0/en/charset-mysql.html
#4
@
4 years ago
Running into this already, though I'm able to work around it manually for the time being.
Relatedly, it is not uncommon to run into plugins that have not moved to utf8mb4
yet, resulting in collections of tables with varying character sets and collations, but I think it's rare that plugins JOIN
their own tables with core ones.
[rant]
I believe plugins should be responsible for their own database tables, but I wish WordPress came with more robust tools to help plugins take on that responsibility in a more uniform way.
BerlinDB versions individual tables with an option for each, and upgrade routines run automatically when tables are out of date. They run their own character set and collation changes like any other upgrade. It's cheap and effective.
Plugins probably should not be JOINing with WordPress core tables, as there is no guarantee all of the tables will be on the same database server and accessible to each other.
[/rant]
#5
@
3 years ago
Setting a charset is important and will be great or having a universal one (like "utf8mb4") because, right now, "utf8" and "utf8mb3" are confusing between versions.
About the collation, each database have their own. For example, default collations:
- MySQL / UTF8mb3: utf8_general_ci
- MariaDB / UTF8mb3: utf8mb3_general_ci
- MySQL / UTF8mb4: utf8mb4_0900_ai_ci
- MariaDB / UTF8mb4: utf8mb4_general_ci
Also, about the collation... Should we explain (like at Sie Health) that, if you write in Spanish, may be better to have something like utf8mb4_spanish_ci so you have order like "...m, n, ñ, o, p..."?
There are a few options that come to mind for this: