WordPress.org

Make WordPress Core

Opened 10 years ago

Closed 9 years ago

Last modified 3 years ago

#42 closed defect (bug) (wontfix)

Encoding Problem with old entries UTF-8 <-> Latin1

Reported by: Agent Orange Owned by: matt
Milestone: Priority: normal
Severity: minor Version:
Component: General Keywords:
Focuses: Cc:

Description

When a new UTF-8 using Wordpress version is used on a database with Latin1-encoded posts, these posts look awful. In my case (german language), the Umlauts and the ß were garbage.
Maybe the Upgrade-Script could convert old entries.

Attachments (1)

0000042-latin2utf.php.txt (1.1 KB) - added by Agent Orange 9 years ago.

Download all attachments as: .zip

Change History (12)

comment:2 ryan10 years ago

Another approach is to use iconv, it it is available on your system. Dump your database to dbdump and then:

iconv -f iso-8859-1 -t utf-8 < dbdump > dbdump.utf

Restore dbdump.utf.

comment:3 matt10 years ago

That seems to deal with only a few characters. Is that all that's needed for Latin-1?

comment:4 Anne10 years ago

No there a lot more characters. The attachment lists only the German special characters, not all people use. Dutch has for example '&iuml;' and '&euml;' and accents on the e, a, i, o etc. Other languages have a '/' through the 'o' (I should learn names of those...). French has also accents and a special 'c' (see Tantek).

comment:5 Agent Orange10 years ago

Yep, I made the character-array manually and only put the chars I need in there (selfish, yes :). But the array can be easily extended.
If you want me to integrate any changes you may have made to the list into the "official" version on my homepage, send them to janvarwig [at] gmx.net.

comment:6 Anne10 years ago

This should not be integrated in "your official version", I think Matt wants this in the default build.

comment:7 Agent Orange10 years ago

That would be even better of course, altough the on-the-fly conversion is hardly an ideal solution.
Not many people have the iconv program, I also doubt that everybody running wordpress is familiar with such tools or even able to dump his DB.

comment:8 Anne10 years ago

Yeah encoding sucks. WP should have a unchangeable default of UTF-8.

comment:9 matt9 years ago

  • Status changed from new to closed

comment:10 matt9 years ago

  • Owner changed from anonymous to matt
  • Resolution changed from 10 to 50

There's not a whole lot we can do here, the charset is much bigger than what this covers.

comment:11 downloadbook3 years ago

  • Cc Agent Orange added; Agent Orange removed
Version 0, edited 3 years ago by downloadbook (next)
Note: See TracTickets for help on using tickets.