WordPress.org

Make WordPress Core

Opened 7 years ago

Closed 6 years ago

#15252 closed defect (bug) (fixed)

Import WXR breaks serialized postmeta that contains a space before the end of a line.

Reported by: shawnparker Owned by:
Milestone: WordPress.org Priority: normal
Severity: normal Version:
Component: Import Keywords: reporter-feedback
Focuses: Cc:

Description

On the whole, serialized data comes in just fine. But we ran in to an issue the other day where serialized is damaged on import.

The case is where serialized data contains a space before a carriage return. During import the serialized data is truncated. I'm not sure exactly where the problem lies, yet, as it seems it might actually be a problem inside fgets(). The obvious place to look as at the rtrim() (line 121 of wordpress-importer.php) that is done on each line but our test case failed even when removing this from the importer.

We haven't been able to investigate much further but I wanted to get the conversation started here.

Change History (6)

#1 in reply to: ↑ description @duck_
7 years ago

  • Keywords reporter-feedback added; importer serialize removed
  • Milestone changed from Awaiting Review to WordPress.org site

Replying to shawnparker:

On the whole, serialized data comes in just fine.

Have you rolled your own fix to #14509 then? Because I'm still seeing double serialization in postmeta (e.g. menu item classes).

Is it possible to have an example WXR file? Also, have you tried the development version of the importer (it has just seen an upgrade, #15197)? I imagine the problem is only apparent for the regex parser, probably in get_tag, which would need to see a fix.

#2 @shawnparker
7 years ago

Our solution for #14509 was to maybe_unserialize our post meta after pulling it (so we double-maybe_unserialize the double-serialized data). Extraneous? yes. Functional? yes.

I'll give the development version of the importer a try. Through further testing I've found that the rtrim() isn't the whole issue but that we're also seeing a carriage return character being doubled.

I'll see if I can generate a dummy WXR - I don't want to post any client specific info here and I haven't gotten too deep in to reproducibility tests yet.

#3 follow-up: @shawnparker
7 years ago

The development version of the importer fixes the issue. Yay!

#4 in reply to: ↑ 3 @duck_
7 years ago

Replying to shawnparker:

The development version of the importer fixes the issue. Yay!

Good, that's what I thought should be the case. A dummy WXR would be useful so this issue can be fixed. from the PHP 4 backwards compatible regular expression parser. Thanks.

#5 @duck_
6 years ago

It's likely that this was actually being caused by slashing the contents of CDATA tags with wpdb::escape() and not caused by trailing spaces.

http://plugins.trac.wordpress.org/changeset/524298

#6 @duck_
6 years ago

  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.