term_exists() doesn't make a difference between z and ẓ

I spent hours before discouvering why wp_insert_term doesn't want to create some of my categories list;

$alphabet = array ( 
	array( 's', 'sth' ),
	array( 'ṣ', 'sth' ),
	array( 'h', 'sth' ),
	array( 'ḥ', 'sth' ),
	array( 'd', 'sth' ),
	array( 'ḍ', 'sth' )

foreach ($categories as $category ) 
	if( term_exists(  mb_strtolower($category[0]), 'category' )) 
		wp_insert_term( $category[0], 'category',  array( 'slug' => $category[0] ) );

term_exists() doesn't make a difference between "normal characters" and diacritic characters.

comment:1 xknown3 years ago

It depends on the MySQL collation you are using. WordPress uses by default utf8_general_ci. See http://dev.mysql.com/doc/refman/5.0/en/charset-collation-effect.html

comment:2 xknown3 years ago

Related to a collation issue #18210

comment:3 abdessamad idrissi3 years ago

I don't think it is related to MySQL collation because all my wp tables are utf8_general_ci

comment:4 SergeyBiryukov3 years ago

This depends on the MySQL collation indeed. I've modified the example from the link given my xknown above, to make it more clear.

To reproduce in MySQL:

CREATE TABLE utf8_general_test ( c CHAR(10) ) CHARACTER SET utf8 COLLATE utf8_general_ci;
CREATE TABLE utf8_bin_test (  c CHAR(10) ) CHARACTER SET utf8 COLLATE utf8_bin;

INSERT INTO utf8_general_test VALUES ('shd'), ('ṣḥḍ');
INSERT INTO utf8_bin_test VALUES ('shd'), ('ṣḥḍ');


SELECT * FROM utf8_general_test WHERE c = 'ṣḥḍ';

will return both values, whereas

SELECT * FROM utf8_bin_test WHERE c = 'ṣḥḍ';

will return only one.

I've tried to put

define('DB_COLLATE', 'utf8_bin');

to wp-config.php and reinstall WordPress (to create new tables in utf8_bin). After that (and changing $alphabet to $characters), your example works fine.

comment:5 ericmann9 months ago

Based on the above comments, this appears to be a non-issue.

comment:6 dd328 months ago

comment:7 SergeyBiryukov8 months ago

