Opened 15 years ago
Closed 15 years ago
#12137 closed defect (bug) (duplicate)
Wordpress import module does not correctly parse XML
Reported by: | greggman | Owned by: | |
---|---|---|---|
Milestone: | Priority: | normal | |
Severity: | normal | Version: | 2.9.1 |
Component: | Import | Keywords: | |
Focuses: | Cc: |
Description
I'm not sure if I can say this well. Basically the Wordpress import module claims to read a modified form of RSS which is based on XML. But the import module is not actually reading XML, it's just parsing text with hardcoded rules. This means you can give perfectly valid XML files and it will fail
Examples. In XML the following 2 lines represent exactly the same data
<content:encoded>hello world</content:encoded>
<content:encoded><![CDATA[hello world]]></content:encoded>
Yet wordpress's import is hardcoded to require the second form.
Another example, these 2 examples represent exactly the same data in XML
--example 1--
<wp:category><wp:cat_name>news</wp:cat_name></wp:category>
--example 2-
<wp:category>
<wp:cat_name>news</wp:cat_name>
</wp:category>
Yet the wordpress importer is hardcoded to only except the first form.
There are many other examples.
The suggestion is to use the build in PHP XML libraries to read the files and then get the data from those. They will correctly parse XML data regardless of whitespace, entity or cdata differences.
Closing as a duplicate of #7400 and others.
We could possibly create a PHP5 importer, then fall back to the current implementation when PHP4. Someone just needs to grab it by the horns I think and push it through.