Opened 9 years ago

Closed 8 years ago

Last modified 19 months ago

#42 closed defect (bug) (wontfix)

Encoding Problem with old entries UTF-8 <-> Latin1

Reported by: Agent Orange Owned by: matt
Priority: normal Milestone:
Component: General Version:
Severity: minor Keywords:
Cc: Agent, Orange

Description

When a new UTF-8 using Wordpress version is used on a database with Latin1-encoded posts, these posts look awful. In my case (german language), the Umlauts and the ß were garbage.
Maybe the Upgrade-Script could convert old entries.

Attachments (1)

0000042-latin2utf.php.txt (1.1 KB) - added by Agent Orange 8 years ago.

Download all attachments as: .zip

Change History (12)

comment:2   ryan9 years ago

Another approach is to use iconv, it it is available on your system. Dump your database to dbdump and then:

iconv -f iso-8859-1 -t utf-8 < dbdump > dbdump.utf

Restore dbdump.utf.

comment:3   matt9 years ago

That seems to deal with only a few characters. Is that all that's needed for Latin-1?

comment:4   Anne9 years ago

No there a lot more characters. The attachment lists only the German special characters, not all people use. Dutch has for example '&iuml;' and '&euml;' and accents on the e, a, i, o etc. Other languages have a '/' through the 'o' (I should learn names of those...). French has also accents and a special 'c' (see Tantek).

Yep, I made the character-array manually and only put the chars I need in there (selfish, yes :). But the array can be easily extended.
If you want me to integrate any changes you may have made to the list into the "official" version on my homepage, send them to janvarwig [at] gmx.net.

comment:6   Anne9 years ago

This should not be integrated in "your official version", I think Matt wants this in the default build.

That would be even better of course, altough the on-the-fly conversion is hardly an ideal solution.
Not many people have the iconv program, I also doubt that everybody running wordpress is familiar with such tools or even able to dump his DB.

comment:8   Anne9 years ago

Yeah encoding sucks. WP should have a unchangeable default of UTF-8.

comment:9   matt8 years ago

  • Status changed from new to closed
  • Owner changed from anonymous to matt
  • Resolution changed from 10 to 50

There's not a whole lot we can do here, the charset is much bigger than what this covers.

  • Cc Agent Orange added; Agent Orange removed
Version 0, edited 19 months ago by downloadbook (next)
Note: See TracTickets for help on using tickets.