WordPress.org

Make WordPress Core

Opened 7 years ago

Closed 5 years ago

Last modified 5 years ago

#7400 closed enhancement (fixed)

WXR doesn't use an XML parser

Reported by: link92 Owned by:
Milestone: 3.1 Priority: normal
Severity: normal Version:
Component: Import Keywords:
Focuses: Cc:

Description

'<title foo:bar="heh>" xmlns:foo="tag:gsnedders.com,2008-07-21:WXR_b0rked">My amazing title</title>' should give 'My amazing title' as the content if parsed with an XML parser, yet your non-XML parser gives '" xmlns:foo="tag:gsnedders.com,2008-07-21:WXR_b0rked">My amazing title'. You just strip out <![CDATA[ and ]]> from the data, giving them absolutely no meaning. <category><![CDATA[&gt;foo>]]></category> and <category>&gt;foo></category> are two totally different categories (the former is '&gt;foo>', the latter is '<foo>').

Change History (12)

comment:1 @hailin7 years ago

  • Milestone changed from 2.7 to 2.9

As the title correctly suggests, this issue is caused by the lack of a XML parser.
We parse XML input files line-by-line, and inevitably it will have some issues.
such as:
http://trac.wordpress.org/ticket/5460

I suggest rewrite/improve the import code, by using an XML parser, when we switch to PHP 5.x. Before that, it's not practical to tweak the current code further, as it may cause other side effects.

comment:2 @link927 years ago

Why can't we just use our existing RSS parser? Does MagpieRSS not provide access to namespaced elements, such as WXR's?

comment:3 @link926 years ago

Looking more closely MagpieRSS doesn't, but now SimplePie is in core it should be quite easy, though having a look at the importer I don't understand what it is doing at all.

comment:4 @Denis-de-Bernardy6 years ago

  • Component changed from General to Import
  • Owner anonymous deleted

comment:5 @hakre6 years ago

SimplePie uses XML Parser Functions: http://us2.php.net/manual/en/ref.xml.php

comment:6 @fastpipe6 years ago

This has become a critical issue for me. I wrote an XSLT script nine months ago to convert one type of XML file into a WXR file so that we could import records from a different database system in as wordpress posts. It worked great on my Mac development server, but failed on two different Linux servers which had different pcre libraries.

I worked around this for a while by importing the files I generated into Wordpress on the Mac and then exporting a new WXR from that install to a new file, which I could then upload to the install running on the Linux server. That worked up until Wordpress 2.8 and now it just fails on the Linux server after saying it was all done without importing any records. Everything still works as before on the Mac with the imported records coming in fine.

I'm about to rewrite the whole system to convert the original XML to SQL and use a temporary table to do the imports. I believe this would be unnecessary with a real XML parser and I can't be the only one that's run into this problem.

comment:7 @nacin5 years ago

  • Milestone changed from 2.9 to Future Release
  • Type changed from defect (bug) to task (blessed)

comment:9 @fastpipe5 years ago

I did in fact rewrite my Bookcollector XML Importer as a custom importer plugin and learned a few things in the process. I ended up using an XML parser, creating a temporary table and doing most of the task via SQL, but perhaps thats the the way to go. I've been doing a lot of conversions lately from Movable Type, but mostly from custom solutions. Even with the Movable Type conversion, I ended up bringing over trackbacks and all the post_content because the importer failed to handled those properly. If there were a common, well-documented WXR format that had a very high chance of success, it would be much simpler to move sites to Wordpress from a variety of other platforms, which would be good for the community. I'd be willing to help as much as I can and offer insights into what works and what doesn't.

comment:10 @nacin5 years ago

  • Type changed from task (blessed) to enhancement

comment:11 @nacin5 years ago

  • Resolution set to fixed
  • Status changed from new to closed

(In [15961]) Importer and exporter overhaul, mega props duck.

Exporter overhaul:

  • Add author information to export
  • Greater usage of slug identifiers
  • Don't export auto-drafts, spam comments, or edit lock/last meta keys
  • Inline documentation improvements
  • Remove filtering for now (@todo)
  • Bump WXR version to 1.1, but remain back compat in the importer

Importer overhaul (http://plugins.trac.wordpress.org/changeset/304249):

  • Use an XML parser where available (SimpleXML, XML Parser)
  • Proper import support for navigation menus
  • Many bug fixes, specifically improvements to category and custom taxonomy handling
  • Better author/user mapping

Fixes #5447 #5460 #7400 #7973 #8471 #9237 #10319 #11118 #11144 #11354 #11574 #12685 #13364 #13394 #13453 #13454 #13627 #14306 #14442 #14524 #14750 #15055 #15091 #15108.

See #15197.

comment:12 @nacin5 years ago

  • Milestone changed from Future Release to 3.1
Note: See TracTickets for help on using tickets.