WordPress.org

Make WordPress Core

Opened 4 years ago

Closed 4 years ago

#12137 closed defect (bug) (duplicate)

Wordpress import module does not correctly parse XML

Reported by: greggman Owned by:
Milestone: Priority: normal
Severity: normal Version: 2.9.1
Component: Import Keywords:
Focuses: Cc:

Description

I'm not sure if I can say this well. Basically the Wordpress import module claims to read a modified form of RSS which is based on XML. But the import module is not actually reading XML, it's just parsing text with hardcoded rules. This means you can give perfectly valid XML files and it will fail

Examples. In XML the following 2 lines represent exactly the same data

<content:encoded>hello world</content:encoded>
<content:encoded><![CDATA[hello world]]></content:encoded>

Yet wordpress's import is hardcoded to require the second form.

Another example, these 2 examples represent exactly the same data in XML

--example 1--
<wp:category><wp:cat_name>news</wp:cat_name></wp:category>
--example 2-
<wp:category>
<wp:cat_name>news</wp:cat_name>
</wp:category>

Yet the wordpress importer is hardcoded to only except the first form.

There are many other examples.

The suggestion is to use the build in PHP XML libraries to read the files and then get the data from those. They will correctly parse XML data regardless of whitespace, entity or cdata differences.

Change History (1)

comment:1 nacin4 years ago

  • Milestone Unassigned deleted
  • Resolution set to duplicate
  • Status changed from new to closed

Closing as a duplicate of #7400 and others.

We could possibly create a PHP5 importer, then fall back to the current implementation when PHP4. Someone just needs to grab it by the horns I think and push it through.

Note: See TracTickets for help on using tickets.