Make WordPress Core

Opened 18 years ago

Closed 18 years ago

Last modified 7 months ago

#2828 closed enhancement (wontfix)

Ease MySQL4.1 character set configuration

Reported by: tenpura's profile tenpura Owned by: tenpura's profile tenpura
Milestone: Priority: high
Severity: normal Version: 2.1
Component: General Keywords: utf-8
Focuses: Cc:

Description

MySQL4.1 and later, character set configuration become complicated and this annoys people. The problem is that if the person has no permission to modify my.cnf, the only solution so far is to hardcode a MySQL query to the core file such as wp-db.php. This is not easy for everyone.
http://www.google.com/search?hl=en&q=%22SET+NAMES%22+%22wp-db.php%22

Most people execute "SET NAMES 'utf8'" query right after the database connection is made. This will set session values of 4 MySQL system variables (character_set_client, character_set_connection, character_set_results, collation_connection) to utf8. In most cases this operation is necessary because these values default to latin1. The only solution I can think of is to control this from wp-config.php. (see attachment)

What I want to say is that this is everybody's issue including English users, because if one gets garbled UTF-8 international trackbacks or comments on his UTF-8 WP, that means his WP may be misconfigured.

Attachments (1)

2828.diff (1.1 KB) - added by tenpura 18 years ago.
any better ideas?

Download all attachments as: .zip

Change History (13)

@tenpura
18 years ago

any better ideas?

#1 @tenpura
18 years ago

  • Keywords i18n removed
  • Priority changed from normal to high
  • Summary changed from ease MySQL4.1 character set configuration to Ease MySQL4.1 character set configuration
  • Type changed from defect to enhancement

phpMyAdmin seems to use "SET NAMES XXX" in more sophisticated way. Please look into their source code, if you have time.

I deleted i18n from keywords because this is rather a general installation/configuration issue that relates to all users.

#2 @tenpura
18 years ago

  • Owner changed from anonymous to tenpura
  • Status changed from new to assigned

#3 @janX
18 years ago

Works for me.

This still doesn't fix the issues with pre-composed and composed Unicode characters I am experiencing. Does anyone know how to get MySQL to match Ü (U+00DC) with Ü (U+0055 U+0308)?

#4 @jimisola
18 years ago

  • Cc public@… added

#5 @matt
18 years ago

  • Milestone changed from 2.1 to 2.2

#6 @tenpura
18 years ago

Since DB load is now pluggable in 2.1 (#2721), I think the problem itself is mostly fixed.

janX,
Try "SET NAMES utf8 COLLATE utf8_unicode_ci" instead of "SET NAMES utf8". You probably need to change database side collation as well.

#7 @lelion
18 years ago

I used:

mysql_query("SET NAMES 'utf8' COLLATE 'utf8_unicode_ci'");

after line 43 in wp-includes/wp-db.php, to resolve the following poblem:
1) Create a new empty database, set collation to: utf8_unicode_ci (in OPERATIONS tab, phpMyAdmin)
2) Create a new WP blog, clean install (2.05)
3) Set UTF-8 as encoding in WP admin panel
4) Basically, everywhere all is UTF-8 (Unicode) and...
5) In the website posts written in both English and Bulgarian languages look OK, but...
5) When I log in into PHP MyAdmin and open the WP table with the posts, I see strange signs ("gibberish";-) instead of Cyrillic characters. So text CANNOT be read normally from the database, only from the website. This could badly impact future upgrades, backup/restore of the MySQL database.
6) To fix this, I tried the following "hack":
Open wp-includes/wp-db.php, add the following code after line 43:

mysql_query("SET NAMES 'utf8' COLLATE 'utf8_unicode_ci'");

After that, EVERYTHING I have posted in Bulgarian (Cyrillic characters) appeared correct BOTH in the website and the database!

It's a pity, one cannot over-ride default encoding settings using .my.cnf in shared hosting environment, to fix this globally :(


But I guess, the above fix proposed (modifications to
wp-config-sample.php and wp-settings.php) is a cleaner way to achive the same effect, right?

#8 @tenpura
18 years ago

  • Resolution set to wontfix
  • Status changed from assigned to closed

WONTFIX for now. It's pluggable in 2.1.

#9 @jimisola
18 years ago

I spent 3 days trying to resolve the encoding problem just to realize that the fix trivial and well-know.

So, how does it help new users that it is pluggable in 2.1?
Is it set to UTF-8 by default or will new users have to find out the hard way??

I strongly believe that the encoding used for the database should be a simple setting in the admin interface and that it should be set to UTF-8 by default.

If new users in fact have to provide a filter I think that in able for this ticket to be closed it should provide (attach) an example of such a filter.

#10 @foolswisdom
18 years ago

  • Keywords utf-8 added; needs-testing removed
  • Milestone 2.2 deleted

jimisola, you are correct that there is an experience problem here. There are other tickets that at least partially describe the problems, see ticket:3517 for example.

#11 @tenpura
18 years ago

jimisola,

I know what you mean, but there are many issues to consider before we go.

Is it set to UTF-8 by default

It is good for new installation, but possibly harmful for upgrades.

I strongly believe that the encoding used for the database should be a simple setting in the admin interface

This cannot be an option stored in the database. Also, I believe that character set related MySQL system variables should not be easily changeable on running WP. Without proper backup and preparation, users may lose their data. (It is described in #3517)

There is another issue. If a user uses a localization file (MO), changing character set related MySQL system variables on running WP might cause login failure. (See #3442)

I have started writing a helper plugin for this and hopefully release it to the public before WP2.1.

#12 @jimisola
17 years ago

@tempura,

What happend with that helper plugin of yours? Is it available somewhere?

Note: See TracTickets for help on using tickets.