Opened 18 years ago
Last modified 3 months ago
#7813 new defect (bug)
export function does not preserve encoding
| Reported by: |
|
Owned by: |
|
|---|---|---|---|
| Milestone: | Future Release | Priority: | normal |
| Severity: | minor | Version: | 2.7 |
| Component: | Export | Keywords: | 2nd-opinion close |
| Focuses: | Cc: |
Description (last modified by )
when exporting strings are always converted to utf-8 while header and encoding is set to encoding used in blog.
this causes trouble when importing later.
WordPress should always convert to UTF-8 and indicate this in the XML declaration and metadata.
Attachments (1)
Change History (13)
#3
follow-up:
↓ 9
@
17 years ago
- Keywords needs-patch needs-testing added; has-patch removed
- Milestone changed from 2.7 to 2.8
Moving to 2.8 for now.
Needs a new patch and some testing before commit:
We can't rely on mb_ functions existing so we need a fallback for that.
#6
@
17 years ago
- Milestone changed from Future Release to 2.9
- Severity changed from normal to minor
#9
in reply to:
↑ 3
@
12 years ago
Replying to westi:
We can't rely on mb_ functions existing so we need a fallback for that.
Several years on, and the only real fallback is to see if the function_exists before calling it. Still, that would hopefully catch a fair proportion of affected cases, wouldn't it?
#11
@
11 months ago
- Keywords 2nd-opinion close added; needs-testing has-patch removed
This report has to be further discussed. I know that has been stalling for more than one decade, and it looks more like "close me and don't open the grave, please".
Patch is obviously not applying, although, it's pretty easy to fix. The big question here is: *should we fix it?
Recently, working for providing support outside UTF-8 has been commented against #62172, and I agree, specially because we are now in 2025.
For me, this is a wontfix clear close candidate.
#12
@
3 months ago
- Description modified (diff)
The WXR export should be UTF-8 because it’s an XML document.
However, there are still things to improve here, but none of them should use utf8_encode().
It would be nice to see improvements in the export flow to convert into UTF-8, but that is definitely a complicated matter.
For now it would be great if someone could clarify the behavior and ensure that WordPress does not indicate that the WXR is in the blog_charset or any other value than utf-8. @tott can you confirm if this is still a problem and explain more on the “header and encoding is set to encoding used in blog” part?
The fix here is not preserving the charset because that will cause all sorts of trouble when attempting to import files. Note that this does not in any way depend on support for UTF-8 code or conversion functionality. If WordPress is unable to convert to UTF-8 from a known charset then it should produce an error of some kind.
Eventually I hope to add full fallback support in #64473 which would give us the tools to do so reliably and safely regardless of which extensions are installed.
possible patch for encoding problem. needs testing