WordPress.org

Make WordPress Core

Opened 5 weeks ago

Last modified 5 weeks ago

#44386 new enhancement

Problem with utf8mb4_unicode_ci collation for arabic content

Reported by: array064 Owned by:
Milestone: Awaiting Review Priority: normal
Severity: major Version: 4.9.6
Component: Database Keywords: needs-testing
Focuses: Cc:

Description

I see that since version 4.6, WordPress uses utf8mb4_unicode_ci as the default collation. I see this in the determine_charset function in the /wp-includes/wp-db.php file (CMIIW).

In my experience, it looks like utf8mb4_unicode_ci has problems with content that uses arabic letters.

Example:

I created a tag with the name:

ٱللَّهِ

And I created another tag with the name:

ٱللَّهُ

Then when I do a tag search (via wp-admin), with keyword:

ٱللَّهُ

the search results that appear are:

ٱللَّهِ

and

ٱللَّهُ

tags. Whereas it should appear only tag:

ٱللَّهُ

according to the search keyword.

This becomes a problem when a post wants to use the tag

ٱللَّهُ

, but can not be due to existing tag

ٱللَّهِ

My guess is not a bug from WordPress, but a bug from MySQL.

For information, perhaps this link is a related issue:

https://bugs.mysql.com/bug.php?id=76218

(CMIIW).

Change History (1)

#1 @array064
5 weeks ago

I forgot to write this:

The above problem does not occur if using utf8mb4_general_ci (or utf8_general_ci) as collaction.

So when installing WordPress, I use the above collation on wp-config.php and MySQL, for some of my websites containing Arabic text.

Note: See TracTickets for help on using tickets.