Make WordPress Core

Opened 20 years ago

Closed 19 years ago

Last modified 13 years ago

#42 closed defect (bug) (wontfix)

Encoding Problem with old entries UTF-8 <-> Latin1

Reported by: agent-orange's profile Agent Orange Owned by: matt's profile matt
Milestone: Priority: normal
Severity: minor Version:
Component: General Keywords:
Focuses: Cc:


When a new UTF-8 using Wordpress version is used on a database with Latin1-encoded posts, these posts look awful. In my case (german language), the Umlauts and the ß were garbage.
Maybe the Upgrade-Script could convert old entries.

Attachments (1)

0000042-latin2utf.php.txt (1.1 KB) - added by Agent Orange 19 years ago.

Download all attachments as: .zip

Change History (12)

#2 @ryan
20 years ago

Another approach is to use iconv, it it is available on your system. Dump your database to dbdump and then:

iconv -f iso-8859-1 -t utf-8 < dbdump > dbdump.utf

Restore dbdump.utf.

#3 @matt
20 years ago

That seems to deal with only a few characters. Is that all that's needed for Latin-1?

#4 @Anne
20 years ago

No there a lot more characters. The attachment lists only the German special characters, not all people use. Dutch has for example '&iuml;' and '&euml;' and accents on the e, a, i, o etc. Other languages have a '/' through the 'o' (I should learn names of those...). French has also accents and a special 'c' (see Tantek).

#5 @Agent Orange
20 years ago

Yep, I made the character-array manually and only put the chars I need in there (selfish, yes :). But the array can be easily extended.
If you want me to integrate any changes you may have made to the list into the "official" version on my homepage, send them to janvarwig [at]

#6 @Anne
20 years ago

This should not be integrated in "your official version", I think Matt wants this in the default build.

#7 @Agent Orange
20 years ago

That would be even better of course, altough the on-the-fly conversion is hardly an ideal solution.
Not many people have the iconv program, I also doubt that everybody running wordpress is familiar with such tools or even able to dump his DB.

#8 @Anne
20 years ago

Yeah encoding sucks. WP should have a unchangeable default of UTF-8.

#9 @matt
19 years ago

  • Status changed from new to closed

#10 @matt
19 years ago

  • Owner changed from anonymous to matt
  • Resolution changed from 10 to 50

There's not a whole lot we can do here, the charset is much bigger than what this covers.

#11 @downloadbook
13 years ago

  • Cc Agent Orange added; Agent Orange removed


Last edited 12 years ago by SergeyBiryukov (previous) (diff)
Note: See TracTickets for help on using tickets.