Opened 6 years ago
Last modified 6 years ago
#44386 new enhancement
Problem with utf8mb4_unicode_ci collation for arabic content
Reported by: | array064 | Owned by: | |
---|---|---|---|
Milestone: | Awaiting Review | Priority: | normal |
Severity: | major | Version: | 4.9.6 |
Component: | Database | Keywords: | needs-testing |
Focuses: | Cc: |
Description
I see that since version 4.6, WordPress uses utf8mb4_unicode_ci as the default collation. I see this in the determine_charset function in the /wp-includes/wp-db.php file (CMIIW).
In my experience, it looks like utf8mb4_unicode_ci has problems with content that uses arabic letters.
Example:
I created a tag with the name:
And I created another tag with the name:
Then when I do a tag search (via wp-admin), with keyword:
the search results that appear are:
and
tags. Whereas it should appear only tag:
according to the search keyword.
This becomes a problem when a post wants to use the tag
, but can not be due to existing tag
My guess is not a bug from WordPress, but a bug from MySQL.
For information, perhaps this link is a related issue:
https://bugs.mysql.com/bug.php?id=76218
(CMIIW).
I forgot to write this:
The above problem does not occur if using utf8mb4_general_ci (or utf8_general_ci) as collaction.
So when installing WordPress, I use the above collation on wp-config.php and MySQL, for some of my websites containing Arabic text.