Make WordPress Core

Opened 11 years ago

Closed 11 years ago

#24221 closed defect (bug) (invalid)

Importer doesn't import properly

Reported by: looimaster's profile Looimaster Owned by:
Milestone: Priority: normal
Severity: normal Version:
Component: Import Keywords:
Focuses: Cc:

Description

This is what [Tools > Export] generated:

	<item>
		<title>Example</title>
		<link>http://example.com/?page_id=4477</link>
		<pubDate>Sun, 03 Feb 2013 12:10:10 +0000</pubDate>
		<dc:creator>admin</dc:creator>
		<guid isPermaLink="false">http://example.com/?page_id=4477</guid>
		<description></description>
		<content:encoded><![CDATA[something at the beginning

<div class="container" style="padding: 3em 0 0 0; margin: 0 0 3em 0; background-color: rgba(195, 195, 195, 0.15);">
	<h2>Heading</h2>
	<p>Paragraph</p>
</div>

something in the end]]></content:encoded>
		<excerpt:encoded><![CDATA[]]></excerpt:encoded>
		<wp:post_id>4477</wp:post_id>
		<wp:post_date>2013-02-03 12:10:10</wp:post_date>
		<wp:post_date_gmt>2013-02-03 12:10:10</wp:post_date_gmt>
		<wp:comment_status>closed</wp:comment_status>
		<wp:ping_status>closed</wp:ping_status>
		[...]
	</item>

And now, when I try to import that (exactly in unmodified form) it strips

style="padding: 3em 0 0 0; margin: 0 0 3em 0; background-color: rgba(195, 195, 195, 0.15);"

part in some places but not all (I guess this is important)!

And it doesn't strip it when I go to [Pages > Example] and switch to "Visual" editor or update the page. It is imported this way (already wrong).

I'm absolutely sure that this happens, tested several times and I can't find an error or explanation.

Export file: UNIX, UTF-8 w/o BOM.

Importer Version: 0.6
Plugin URL: http://wordpress.org/extend/plugins/wordpress-importer/

I've seen a couple of issues related to "update_post_meta" that is stripping quotation marks etc. Maybe this is related.

Installation: WPMU (unfiltered_html is probably disabled because it's non-super-admin blog).

Change History (6)

#1 @Looimaster
11 years ago

One thing to emphasize: there are more style tags in exported content and 4 out of ~30 were removed. Actually, first and last ones for each new section (each new div row) were removed and all style tags in the middle weren't.

#2 @SergeyBiryukov
11 years ago

  • Component changed from General to Import
  • Milestone changed from Awaiting Review to WordPress.org
  • Version changed from trunk to 3.5

#3 @Looimaster
11 years ago

To better show this... expected result is this:

http://gyazo.com/3e7b0ff18706bb6579786e67ba16687b.png

and it did this:
http://gyazo.com/5142b3d6ecbab5661586a6b9648e685f.png

This is not from notepad - this is screenshot from WordPress blog and the code is pure text with span elements that have style tags.

As you see green class from the very top and very bottom were removed. I don't have explanation for "LAST" word though - it isn't at the top or bottom but it's nested inside orange SPAN (maybe that's the case).

#4 @Looimaster
11 years ago

Actually, here's exact code that I exported and imported (from exported XML):

<div class="container">
	<div class="row">
		<div class="grid_4">
<pre><span style="color: rgb(138, 204, 3);">&lt;div class="container"&gt;</span>
  <span style="color: #e00000;">&lt;div class="row"&gt;</span>
    <span style="color: orange;">&lt;div class="grid_3"&gt;</span>
      <span style="color: rgb(172, 172, 172);">&lt;p&gt;Column 1&lt;/p&gt;</span>
    <span style="color: orange;">&lt;/div&gt;</span>
    <span style="color: orange;">&lt;div class="grid_3"&gt;</span>
      <span style="color: rgb(172, 172, 172);">&lt;p&gt;Column 2&lt;/p&gt;</span>
    <span style="color: orange;">&lt;/div&gt;</span>
    <span style="color: orange;">&lt;div class="grid_3"&gt;</span>
      <span style="color: rgb(172, 172, 172);">&lt;p&gt;Column 3&lt;/p&gt;</span>
    <span style="color: orange;">&lt;/div&gt;</span>
    <span style="color: orange;">&lt;div class="grid_3 <span style="color: rgb(138, 43, 226);">last</span>"&gt;</span>
      <span style="color: rgb(172, 172, 172);">&lt;p&gt;Column 4&lt;/p&gt;</span>
    <span style="color: orange;">&lt;/div&gt;</span>
  <span style="color: #e00000;">&lt;/div&gt;</span>
<span style="color: rgb(138, 204, 3);">&lt;/div&gt;</span>
</pre>
		</div>
	</div>
</div>

This ticket was mentioned in IRC in #wordpress-dev by danielbachhuber. View the logs.


11 years ago

#6 @danielbachhuber
11 years ago

  • Milestone WordPress.org deleted
  • Resolution set to invalid
  • Status changed from new to closed
  • Version 3.5 deleted

Hi Looimaster,

I did some testing on this today using this data:

<content:encoded><![CDATA[<div class="container" style="padding: 3em 0 0 0; margin: 0 0 3em 0; background-color: rgba(195, 195, 195, 0.15);">
<h2>Heading</h2>
<p>Paragraph</p>
</div>]]></content:encoded>

Here are my results:

  • Doesn't reproduce on WordPress trunk for a single site.
  • On WordPress trunk for multisite, this gets stripped: style="padding: 3em 0 0 0; margin: 0 0 3em 0; background-color: rgba(195, 195, 195, 0.15);"
  • On import, the WordPress importer doesn't actually do any sanitization. It passes my expected data to wp_insert_post(), which the normal kses rules apply to.
  • Your inline CSS is sanitized by safecss_filter_attr() (ref). It's stripped out entirely because of #10336

I think your best bet in this use case is to remove kses filters when it's a "trusted" import. You might also want to come up with an alternative styling mechanism such that you don't need to let your users use potentially sketchy inline CSS.

Note: See TracTickets for help on using tickets.