WordPress.org

Make WordPress Core

Opened 12 years ago

Closed 12 years ago

Last modified 5 years ago

#42 closed defect (bug) (wontfix)

Encoding Problem with old entries UTF-8 <-> Latin1

Reported by: Agent Orange Owned by: matt
Milestone: Priority: normal
Severity: minor Version:
Component: General Keywords:
Focuses: Cc:

Description

When a new UTF-8 using Wordpress version is used on a database with Latin1-encoded posts, these posts look awful. In my case (german language), the Umlauts and the ß were garbage.
Maybe the Upgrade-Script could convert old entries.

Attachments (1)

0000042-latin2utf.php.txt (1.1 KB) - added by Agent Orange 12 years ago.

Download all attachments as: .zip

Change History (12)

#2 @ryan
12 years ago

Another approach is to use iconv, it it is available on your system. Dump your database to dbdump and then:

iconv -f iso-8859-1 -t utf-8 < dbdump > dbdump.utf

Restore dbdump.utf.

#3 @matt
12 years ago

That seems to deal with only a few characters. Is that all that's needed for Latin-1?

#4 @Anne
12 years ago

No there a lot more characters. The attachment lists only the German special characters, not all people use. Dutch has for example '&iuml;' and '&euml;' and accents on the e, a, i, o etc. Other languages have a '/' through the 'o' (I should learn names of those...). French has also accents and a special 'c' (see Tantek).

#5 @Agent Orange
12 years ago

Yep, I made the character-array manually and only put the chars I need in there (selfish, yes :). But the array can be easily extended.
If you want me to integrate any changes you may have made to the list into the "official" version on my homepage, send them to janvarwig [at] gmx.net.

#6 @Anne
12 years ago

This should not be integrated in "your official version", I think Matt wants this in the default build.

#7 @Agent Orange
12 years ago

That would be even better of course, altough the on-the-fly conversion is hardly an ideal solution.
Not many people have the iconv program, I also doubt that everybody running wordpress is familiar with such tools or even able to dump his DB.

#8 @Anne
12 years ago

Yeah encoding sucks. WP should have a unchangeable default of UTF-8.

#9 @matt
12 years ago

  • Status changed from new to closed

#10 @matt
12 years ago

  • Owner changed from anonymous to matt
  • Resolution changed from 10 to 50

There's not a whole lot we can do here, the charset is much bigger than what this covers.

#11 @downloadbook
5 years ago

  • Cc Agent Orange added; Agent Orange removed
Version 0, edited 5 years ago by downloadbook (next)
Note: See TracTickets for help on using tickets.