WordPress.org

Make WordPress Core

Opened 8 years ago

Closed 7 years ago

Last modified 7 years ago

#11574 closed defect (bug) (fixed)

WordPress importer fails to import content which contains importer tags within it

Reported by: westi Owned by: westi
Milestone: 3.1 Priority: normal
Severity: normal Version: 2.7
Component: Import Keywords: has-patch 2nd-opinion
Focuses: Cc:

Description

For Example:

If you have the following post_meta value:

<wp:postmeta>
<wp:meta_key>evil</wp:meta_key>
<wp:meta_value><![CDATA[<wp:meta_value>evil</wp:meta_value>]]></wp:meta_value>
</wp:postmeta>

Currently we would import: <![CDATA[<wp:meta_value>evil

This is because the regex in WP_Import::get_tag() is non-greedy.

Tested back as far as 2.7.0 and this exists there too.

Attachments (2)

11574.diff (637 bytes) - added by westi 8 years ago.
This fixes it for me.
11574-improved.diff (886 bytes) - added by westi 8 years ago.
Better patch which merges the two regexs and preserves the old non-greed behaviour

Download all attachments as: .zip

Change History (8)

@westi
8 years ago

This fixes it for me.

#1 @westi
8 years ago

  • Keywords has-patch 2nd-opinion added
  • Milestone changed from 3.0 to Future Release

That RegEx has always been that way - http://core.trac.wordpress.org/browser/trunk/wp-admin/import/wordpress.php?rev=3769

Reviewing the code the danger with this change is that we don't always call get_tag with such a targeted string as we do with post meta.

It maybe best to keep the current non-greedy regex for now and fix this issue with a XML parser based importer.

@westi
8 years ago

Better patch which merges the two regexs and preserves the old non-greed behaviour

#2 @westi
8 years ago

  • Milestone changed from Future Release to 3.0

After a bit of a fight with the RegEx I think I have a solution to this which we can use and still preserve the non-greedy behaviour of the old code.

Moving back as a potential 3.0 candidate.

I would love some feedback on this - especially any extra test cases people can suggest.

#3 @hakre
8 years ago

I think it's adviseable to drop the use of regex for parsing (defective by design) and switch over to an xml parser. There should be no need to re-invent the wheel.

#4 @dd32
8 years ago

  • Milestone changed from 3.0 to 3.1

hakre: we're limited in the sense that we can never be sure there is a XML parser available, not with the current PHP requirements. A Regex based option is going to be required as a fallback at least for the immediate future.

Moving to 3.1 due to lack of testing feedback.

#5 @nacin
7 years ago

  • Resolution set to fixed
  • Status changed from new to closed

(In [15961]) Importer and exporter overhaul, mega props duck.

Exporter overhaul:

  • Add author information to export
  • Greater usage of slug identifiers
  • Don't export auto-drafts, spam comments, or edit lock/last meta keys
  • Inline documentation improvements
  • Remove filtering for now (@todo)
  • Bump WXR version to 1.1, but remain back compat in the importer

Importer overhaul (http://plugins.trac.wordpress.org/changeset/304249):

  • Use an XML parser where available (SimpleXML, XML Parser)
  • Proper import support for navigation menus
  • Many bug fixes, specifically improvements to category and custom taxonomy handling
  • Better author/user mapping

Fixes #5447 #5460 #7400 #7973 #8471 #9237 #10319 #11118 #11144 #11354 #11574 #12685 #13364 #13394 #13453 #13454 #13627 #14306 #14442 #14524 #14750 #15055 #15091 #15108.

See #15197.

#6 @nacin
7 years ago

  • Milestone changed from Awaiting Triage to 3.1
Note: See TracTickets for help on using tickets.