WordPress.org

Make WordPress Core

Opened 5 years ago

Closed 5 years ago

#9212 closed defect (bug) (duplicate)

Livejournal importer dropping '<' on all HTML tags

Reported by: amaiman Owned by: beaulebens
Milestone: 2.8 Priority: normal
Severity: major Version: 2.8
Component: Import Keywords: LiveJournal, import, HTML, tags
Focuses: Cc:

Description

Whenever I use the Livejournal API importer in trunk to attempt to import posts from Livejournal, the importer strips the "<" from every HTML tag, thus outputting unusable results. (The tags are visible in the post, since they don't get processed by the browser without the "<", thus all formatting/images/etc. in the posts are lost.)

I have tried this with both my Livejournal (containing over 100 posts) and a new journal that I created containing only a test post with various HTML tags, I got the same results with both.

This problem may not be within the livejournal.php file, it may have been caused by a change to some other component within Wordpress trunk, I haven't had the time to track it down and I'm not particularly familiar with the Wordpress source yet. This behavior is occurring with the version of trunk current as of the time this ticket was entered.

An example:

Source post in Livejournal:

Testing this post.

<b>Bold</b>
<i>Italic</i>
<u>Underline</u>
<p><p>
<img src="http://p-stat.livejournal.com/horizon/logo.gif">
<br><br>

More tests:

<a href="http://www.livejournal.com">Link</a>

<a href="http://www.livejournal.com"><img src="http://p-stat.livejournal.com/horizon/logo.gif"></a>

End.

Resulting contents in Wordpress after running the importer:

Testing this post.

b>Bold/b>
i>Italic/i>
u>Underline/u>
p>p>
img src="http://p-stat.livejournal.com/horizon/logo.gif">
br>br>

More tests:

a href="http://www.livejournal.com">Link/a>

a href="http://www.livejournal.com">img src="http://p-stat.livejournal.com/horizon/logo.gif">/a>

End.

Change History (6)

comment:1 amaiman5 years ago

I verified that this problem is not being caused by any problems with my particular web server. (Tested by creating a new Linux/Apache/PHP/MySQL install on clean Amazon EC2 image and installed trunk -- same results when I tried to import from Livejournal).

comment:2 DD325 years ago

  • Version set to 2.8

That doesn't mean that its not a well-known bug in certain versions of the XML parser library.

Is there a Livejournal test import available somewhere so i can play with it? (I Cant remember the location of test data.. or if theres even one..)

comment:3 amaiman5 years ago

The new version of the importer imports directly from a Livejournal account. I created a new account to do my tests (testwordpress.livejournal.com) and made a post containing the HTML that's in the ticket above.

comment:4 amaiman5 years ago

It does appear that is has something to do with the version of the XML parser library. I fired up another EC2 image (an older one this time), and the import was successful. We'll need to figure out the specific versions that are causing the problem and either work around it or have the importer show some kind of warning.

comment:5 josephscott5 years ago

Or use non-broken version combinations of PHP & libxml2. This means PHP 5.2.9 and libxml2 2.7.3. I've written about this issue before:

http://josephscott.org/archives/2008/12/problems-with-libxml2-for-wordpress-xml-rpc-users/

http://josephscott.org/archives/2009/02/update-on-libxml2-issues/

With the release of PHP 5.2.9 there are now officially released versions of both products that work again.

comment:6 amaiman5 years ago

  • Resolution set to duplicate
  • Status changed from new to closed

Thanks for the info, I hadn't found it before I opened this ticket.

Upgrading my server to PHP 5.2.9 resolved the problem for me.

I'll close this ticket, reference ticket#7771 for further discussion. Perhaps a check/warning message should be added to Wordpress if a broken version is detected.

Note: See TracTickets for help on using tickets.