#7400 closed enhancement (fixed)
WXR doesn't use an XML parser
Reported by: |
|
Owned by: | |
---|---|---|---|
Milestone: | 3.1 | Priority: | normal |
Severity: | normal | Version: | |
Component: | Import | Keywords: | |
Focuses: | Cc: |
Description
'<title foo:bar="heh>" xmlns:foo="tag:gsnedders.com,2008-07-21:WXR_b0rked">My amazing title</title>' should give 'My amazing title' as the content if parsed with an XML parser, yet your non-XML parser gives '" xmlns:foo="tag:gsnedders.com,2008-07-21:WXR_b0rked">My amazing title'. You just strip out <![CDATA[ and ]]> from the data, giving them absolutely no meaning. <category><![CDATA[>foo>]]></category> and <category>>foo></category> are two totally different categories (the former is '>foo>', the latter is '<foo>').
Change History (12)
#2
@
16 years ago
Why can't we just use our existing RSS parser? Does MagpieRSS not provide access to namespaced elements, such as WXR's?
#3
@
16 years ago
Looking more closely MagpieRSS doesn't, but now SimplePie is in core it should be quite easy, though having a look at the importer I don't understand what it is doing at all.
#5
@
16 years ago
SimplePie uses XML Parser Functions: http://us2.php.net/manual/en/ref.xml.php
#6
@
15 years ago
This has become a critical issue for me. I wrote an XSLT script nine months ago to convert one type of XML file into a WXR file so that we could import records from a different database system in as wordpress posts. It worked great on my Mac development server, but failed on two different Linux servers which had different pcre libraries.
I worked around this for a while by importing the files I generated into Wordpress on the Mac and then exporting a new WXR from that install to a new file, which I could then upload to the install running on the Linux server. That worked up until Wordpress 2.8 and now it just fails on the Linux server after saying it was all done without importing any records. Everything still works as before on the Mac with the imported records coming in fine.
I'm about to rewrite the whole system to convert the original XML to SQL and use a temporary table to do the imports. I believe this would be unnecessary with a real XML parser and I can't be the only one that's run into this problem.
#7
@
15 years ago
- Milestone changed from 2.9 to Future Release
- Type changed from defect (bug) to task (blessed)
#9
@
15 years ago
I did in fact rewrite my Bookcollector XML Importer as a custom importer plugin and learned a few things in the process. I ended up using an XML parser, creating a temporary table and doing most of the task via SQL, but perhaps thats the the way to go. I've been doing a lot of conversions lately from Movable Type, but mostly from custom solutions. Even with the Movable Type conversion, I ended up bringing over trackbacks and all the post_content because the importer failed to handled those properly. If there were a common, well-documented WXR format that had a very high chance of success, it would be much simpler to move sites to Wordpress from a variety of other platforms, which would be good for the community. I'd be willing to help as much as I can and offer insights into what works and what doesn't.
#11
@
14 years ago
- Resolution set to fixed
- Status changed from new to closed
(In [15961]) Importer and exporter overhaul, mega props duck.
Exporter overhaul:
- Add author information to export
- Greater usage of slug identifiers
- Don't export auto-drafts, spam comments, or edit lock/last meta keys
- Inline documentation improvements
- Remove filtering for now (@todo)
- Bump WXR version to 1.1, but remain back compat in the importer
Importer overhaul (http://plugins.trac.wordpress.org/changeset/304249):
- Use an XML parser where available (SimpleXML, XML Parser)
- Proper import support for navigation menus
- Many bug fixes, specifically improvements to category and custom taxonomy handling
- Better author/user mapping
Fixes #5447 #5460 #7400 #7973 #8471 #9237 #10319 #11118 #11144 #11354 #11574 #12685 #13364 #13394 #13453 #13454 #13627 #14306 #14442 #14524 #14750 #15055 #15091 #15108.
See #15197.
As the title correctly suggests, this issue is caused by the lack of a XML parser.
We parse XML input files line-by-line, and inevitably it will have some issues.
such as:
http://trac.wordpress.org/ticket/5460
I suggest rewrite/improve the import code, by using an XML parser, when we switch to PHP 5.x. Before that, it's not practical to tweak the current code further, as it may cause other side effects.