Make WordPress Core

Opened 18 years ago

Last modified 3 months ago

#7813 new defect (bug)

export function does not preserve encoding

Reported by: tott's profile tott Owned by: tott's profile tott
Milestone: Future Release Priority: normal
Severity: minor Version: 2.7
Component: Export Keywords: 2nd-opinion close
Focuses: Cc:

Description (last modified by dmsnell)

when exporting strings are always converted to utf-8 while header and encoding is set to encoding used in blog.

this causes trouble when importing later.

WordPress should always convert to UTF-8 and indicate this in the XML declaration and metadata.

Attachments (1)

export.diff (782 bytes) - added by tott 18 years ago.
possible patch for encoding problem. needs testing

Download all attachments as: .zip

Change History (13)

@tott
18 years ago

possible patch for encoding problem. needs testing

#1 @lloydbudd
18 years ago

  • Cc lloydbudd added
  • Milestone set to 2.7
  • Version set to 2.7

#2 @lloydbudd
18 years ago

  • Keywords has-patch added

#3 follow-up: @westi
17 years ago

  • Keywords needs-patch needs-testing added; has-patch removed
  • Milestone changed from 2.7 to 2.8

Moving to 2.8 for now.

Needs a new patch and some testing before commit:

We can't rely on mb_ functions existing so we need a fallback for that.

#4 @Denis-de-Bernardy
17 years ago

  • Milestone changed from 2.8 to Future Release

#5 @Denis-de-Bernardy
17 years ago

  • Component changed from i18n to Export

#6 @Denis-de-Bernardy
17 years ago

  • Milestone changed from Future Release to 2.9
  • Severity changed from normal to minor

#7 @ryan
16 years ago

  • Milestone changed from 2.9 to Future Release

#8 @iseulde
13 years ago

  • Keywords export encoding i18n removed

#9 in reply to: ↑ 3 @GaryJ
12 years ago

Replying to westi:

We can't rely on mb_ functions existing so we need a fallback for that.

Several years on, and the only real fallback is to see if the function_exists before calling it. Still, that would hopefully catch a fair proportion of affected cases, wouldn't it?

#10 @chriscct7
10 years ago

  • Keywords has-patch added; needs-patch removed

#11 @SirLouen
11 months ago

  • Keywords 2nd-opinion close added; needs-testing has-patch removed

This report has to be further discussed. I know that has been stalling for more than one decade, and it looks more like "close me and don't open the grave, please".

Patch is obviously not applying, although, it's pretty easy to fix. The big question here is: *should we fix it?

Recently, working for providing support outside UTF-8 has been commented against #62172, and I agree, specially because we are now in 2025.

For me, this is a wontfix clear close candidate.

#12 @dmsnell
3 months ago

  • Description modified (diff)

The WXR export should be UTF-8 because it’s an XML document.

However, there are still things to improve here, but none of them should use utf8_encode().

It would be nice to see improvements in the export flow to convert into UTF-8, but that is definitely a complicated matter.

For now it would be great if someone could clarify the behavior and ensure that WordPress does not indicate that the WXR is in the blog_charset or any other value than utf-8. @tott can you confirm if this is still a problem and explain more on the “header and encoding is set to encoding used in blog” part?

The fix here is not preserving the charset because that will cause all sorts of trouble when attempting to import files. Note that this does not in any way depend on support for UTF-8 code or conversion functionality. If WordPress is unable to convert to UTF-8 from a known charset then it should produce an error of some kind.

Eventually I hope to add full fallback support in #64473 which would give us the tools to do so reliably and safely regardless of which extensions are installed.

Note: See TracTickets for help on using tickets.