Make WordPress Core

Opened 11 years ago

Last modified 5 years ago

#25872 new defect (bug)

WXR export tool generates XML which is not well-formed

Reported by: tomdxw's profile tomdxw Owned by:
Milestone: Priority: normal
Severity: normal Version: 3.7.1
Component: Export Keywords:
Focuses: Cc:


  1. Paste a form feed character (aka \f or U+000C) into a post
  2. Tools > Export > Download Export File
  3. Validate the exported file (i.e. xmlstarlet validate --well-formed ~/Downloads/test.wordpress.2013-11-07.xml)

The resulting file is not well-formed XML because WordPress has failed to strip characters which are not allowed by the XML specification ( ).

Change History (4)

#1 @tomdxw
11 years ago

  • Cc tom@… added

#2 @GaryJ
10 years ago

How would you propose that invalid characters are stripped / converted?

#3 @tomdxw
10 years ago

I'd just iterate through the codepoints in wxr_cdata() and replace disallowed codepoints with U+FFFD (the replacement character). I'm not sure of the best way to iterate through codepoints in PHP - but UTF-8 parsers aren't hard to write if there isn't already a function that does it.

#4 @mdgl
9 years ago

Related #19998.

Note: See TracTickets for help on using tickets.