Make WordPress Core

Opened 5 years ago

Last modified 11 months ago

#22279 new defect (bug)

WordPress Export/Import deletes carriage returns

Reported by: mykle Owned by:
Milestone: Awaiting Review Priority: normal
Severity: normal Version: 3.4.2
Component: Export Keywords: needs-patch dev-feedback
Focuses: Cc:

Description (last modified by ocean90)

WordPress export does not translate or escape bare CR characters in a CR/LF pair. They show up unfiltered in the WXR export file. I see this both in post_content and in strings that were serialized into a post_meta field. The CR characters are in the WXR file, unfiltered.

Then, WordPress import loses these CR characters. They are simply erased. It may be because SimpleXMLParser can't or won't open the XML file in binary mode, so line ending translation can & does happen. That's just a theory, but if it's true then this behavior might *not* happen on all platforms or with all PHP versions. (I'm seeing this on OS X 10.6.8, PHP 5.4.4.)

In the worse case -- mine -- the munged string is a small component of a complex datastructure that is serialized in a postmeta record. In this case, the entire meta_value field is deleted on import, because the data won't unserialize, because its length has changed.

It seems to me that WP Export should escape any character that might be threatened in transit. I'm no XML lawyer, but some sources claim that unescaped CR characters are invalid XML.

To reproduce:

  • store a carriage return in a post.
  • export it to a WXR file.
  • examine the WXR file for the raw carriage return (^M).
  • import that file.
  • search for the carriage return.

Change History (6)

#1 @ocean90
3 years ago

  • Description modified (diff)
  • Summary changed from Wordpress Export/Import deletes carriage returns to WordPress Export/Import deletes carriage returns

#2 @WraithKenny
3 years ago

I've had issues with this, and couldn't figure out the proper solution.

Support forum issues are usually resolved (or abandoned) with vague explanations that plugins are doing it wrong. As a plugin developer that IS doing this wrong, it's really not helpful since it's extremely hard to figure out what I'm actually doing wrong, since there exists no good explanation of the problem, nor any best practice tutorial for doing it right (what ever that may be). The only thing I do know about this issue is that sometimes the length of serialized meta is wrong do to line-endings. (Why the line-endings are sometimes CR/LF I haven't been able to trace, but think it's do to ajax saves, in my plugin at least.)

My feeling is, if you are using the correct APIs and sanitation practices, something like this shouldn't happen: there should be no extra unknown step for plugin developers to do (like say normalize all line-endings prior to update_option/_post_meta? No idea if that'd work, and that's the point: There's no community education around this issue). I'm sure the majority of authors have no idea if the exporter/importer works with their meta.

Anyway, if this is a plugin developer problem, some guidance would be appreciated, if not, a bug fix would be appreciated.

#3 @GaryJ
3 years ago

The WPTest.io export naturally suffers from the same problem. One can't just edit the .xml file, since saving it results in the mix of line endings (XML vs content) all trying to be the same, which may then cause problems for importing.

#4 @chriscct7
17 months ago

  • Keywords needs-patch added
  • Severity changed from major to normal

#5 @screamingdev
11 months ago

  • Keywords dev-feedback added

This seems to be a problem with DOMDocument. It parses the XML and seems to add an "\n" to every "\r" while the simplexml_import_dom() eats all the "\r". So "\n" remains.

Scenarios to solve/evaluate:

1) Find out why/if simplexml eats little carriage returns.
2) Do no longer use the PHP internals / modules such as DOMDocument and simplexml for importing.
3) This is not a bug. Might be that the XML-Spec does not allow carriage returns.

I leave it like that for now and do another ticket. This ticket is going to deep and costs to much time for me.

Last edited 11 months ago by screamingdev (previous) (diff)

This ticket was mentioned in Slack in #core by screamingdev. View the logs.

11 months ago

Note: See TracTickets for help on using tickets.