Wordpress import module does not correctly parse XML
|Reported by:||greggman||Owned by:|
I'm not sure if I can say this well. Basically the Wordpress import module claims to read a modified form of RSS which is based on XML. But the import module is not actually reading XML, it's just parsing text with hardcoded rules. This means you can give perfectly valid XML files and it will fail
Examples. In XML the following 2 lines represent exactly the same data
Yet wordpress's import is hardcoded to require the second form.
Another example, these 2 examples represent exactly the same data in XML
Yet the wordpress importer is hardcoded to only except the first form.
There are many other examples.
The suggestion is to use the build in PHP XML libraries to read the files and then get the data from those. They will correctly parse XML data regardless of whitespace, entity or cdata differences.