#11144 closed defect (bug) (fixed)
WP importer strips whitespace at EOL, problem with Markdown
Reported by: | demetris | Owned by: | |
---|---|---|---|
Milestone: | 3.1 | Priority: | normal |
Severity: | normal | Version: | 2.9 |
Component: | Import | Keywords: | has-patch |
Focuses: | Cc: |
Description
Markdown does not convert line breaks to (X)HTML break tags. Instead, when you want a line break, you end a line with two or more spaces.
The WP WXR exporter retains those spaces at ends of lines. However, when you import the file to WP, the spaces are removed.
Can this be fixed, or is there some reason it should stay as it is now?
Thanks for reading!
Attachments (7)
Change History (30)
#2
@
15 years ago
No. Spaces. Like this:
I want an HTML line-break at the end of this line[SPACE][SPACE][LF]
This is standard in Markdown, and the only way to insert an HTML line break. The only implementation that changes it is the Github flavour. (I suppose they changed it because it did not suit well their particular usage, but, in general, the default Markdown thing, treating line breaks as just spaces, is more useful.)
#4
@
15 years ago
The issue seems to be related to an rtrim() (unsurprisingly) — in particular, the first of the two instances of rtrim() in the importer:
http://core.trac.wordpress.org/browser/trunk/wp-admin/import/wordpress.php
I have no idea what that line does and why it uses rtrim() (this stuff is too complex for me). Removing the rtrim() from there makes the importer preserve the whitespace at EOLs, but it has a side-effect too: all LFs are duplicated. (So, too much white space, and no line breaks AT ALL any more, either with Markdown or with wpautop().)
That’s as far as my investigation can go. :-)
#5
@
15 years ago
demetris, just for clarification: can you confirm the problem with markdown / whitespace at the end of lines for the current trunk?
#7
@
15 years ago
- Keywords reporter-feedback removed
@hakre:
Yep. Just checked with the current trunk and the problem still exists.
BTW, one does not need Markdown to verify the issue. Just export a post that has some lines with whitespace at the end, and then import it: The whitespace is preserved in the XML file, but is stripped during the import.
#11
@
15 years ago
- Keywords needs-patch added; has-patch removed
Thanks for the patch, dwright.
I tested it with current trunk (r13304) but the behaviour remains the same: the spaces at EOLs are there in the XML file but are missing after importing.
#12
@
15 years ago
I tested it with current trunk (r13304) but the behaviour remains the same: the spaces at EOLs are there in the XML file but are missing after importing.
I guess I don't understand what the issue is then.
see attached screenshots showing the imported newlines/whitespace
(the whitespace is killed in visual mode but preserved in html with included patch)
can you maybe send an image of what your after?
#13
follow-up:
↓ 14
@
15 years ago
Thanks for the reply, dwright.
So, in your tests, importing does not affect strings of spaces before EOLs.
While in my tests those strings of spaces are completely removed. (I also verified the issue by looking at the post_content in the DB entry.)
The only difference of environments I can think of is the Visual Editor, which I always disable in my user settings. But should this have any effect on the process of importing?
I attach a visual explanation of what happens whenever I import, and why this is a problem when you Markdown is used.
#14
in reply to:
↑ 13
@
15 years ago
Replying to demetris:
Thanks for the reply, dwright.
So, in your tests, importing does not affect strings of spaces before EOLs.
While in my tests those strings of spaces are completely removed. (I also verified the issue by looking at the post_content in the DB entry.)
So you are saying it looks like it's working as desired in my environment and not in yours?
If so, are you sure the patch applied?
Take this session:
post as originally written:
mysql> select post_content from wp_posts where ID=6; +------------------------+ | post_content | +------------------------+ | test | +------------------------+
export xml file.
post deleted. (and trash emptied)
xml file imported (without patch)
mysql> select post_content from wp_posts where ID=6; +--------------+ | post_content | +--------------+ | test | +--------------+ 1 row in set (0.00 sec)
This is the behaviour I believe you are seeing, correct?
Now, apply the patch:
[dwright:wordpress]$ patch wp-admin/import/wordpress.php < wp-admin/import/wordpress.php.patch patching file wp-admin/import/wordpress.php [dwright:wordpress]$ svn diff Index: wp-admin/import/wordpress.php =================================================================== --- wp-admin/import/wordpress.php (revision 13450) +++ wp-admin/import/wordpress.php (working copy) @@ -56,7 +56,8 @@ preg_match("|<$tag.*?>(.*?)</$tag>|is", $string, $return); if ( isset($return[1]) ) { $return = preg_replace('|^<!\[CDATA\[(.*)\]\]>$|s', '$1', $return[1]); - $return = $wpdb->escape( trim( $return ) ); + if ($tag == 'content:encoded') $return = $wpdb->escape( trim( $return, "\t\0\x0B" ) ); + else $return = $wpdb->escape( trim( $return ) ); } else { $return = ''; }
delete post ID=6; empty trash; restart webserver and import the same xml file.
mysql> select post_content from wp_posts where ID=6; +-------------------+ | post_content | +-------------------+ | test | +-------------------+ 1 row in set (0.00 sec)
This is the result I believe you want correct?
#15
@
15 years ago
why not simply change:
$return = $wpdb->escape( trim( $return ) );
into:
$return = $wpdb->escape( $return );
there isn't any reason to trim anything here.
#16
follow-up:
↓ 17
@
15 years ago
dwright, I tried again but the problem remains.
I imported my sample post/page into two different WP installations, after making sure that the patch had been applied correctly. I’m attaching the WRX file, if anyone else wants to give it a quick try, to make sure I’m not doing something wrong.
#17
in reply to:
↑ 16
;
follow-up:
↓ 19
@
15 years ago
I see, thanks for the attachment demetris.
My tests were importing 'posts', not 'pages'. (it's a different process code-wise)
Here is a patch which will work for either. It's not the greatest patch but is very conservative, only effecting '<content:encoded>'.
Denis, possibly that is true, I wanted to keep the patch as conservative as possible, so I left in trim (removing spaces and newline's for content only).
In this new patch (for page imports, the 'rtrim($importline, "\n\r\0\x0B");' is crucial. RE: '$this->post .= $importline . "\n";' affix's a newline to content. (so if rtrim is removed, the content would have double line breaks, as demetris originally found out.)
Note, I didn't run the WP test suite.
@
15 years ago
maintain linefeeds and spaces, only for content:encoded tags, pages or posts WP import
@
15 years ago
maintain linefeeds and spaces, only for content:encoded tags, pages or posts WP import
#19
in reply to:
↑ 17
@
15 years ago
Replying to dwright:
I see, thanks for the attachment demetris.
My tests were importing 'posts', not 'pages'. (it's a different process code-wise)
Here is a patch which will work for either. It's not the greatest patch but is very conservative, only effecting '<content:encoded>'.
For some reason I was only testing with Pages and it had not occured to me that the process might be different. :-|
The new patch fixes the issue for both Posts and Pages.
Thank you for your time and I apologize for the confusion.
#21
@
14 years ago
- Resolution set to fixed
- Status changed from new to closed
(In [15961]) Importer and exporter overhaul, mega props duck.
Exporter overhaul:
- Add author information to export
- Greater usage of slug identifiers
- Don't export auto-drafts, spam comments, or edit lock/last meta keys
- Inline documentation improvements
- Remove filtering for now (@todo)
- Bump WXR version to 1.1, but remain back compat in the importer
Importer overhaul (http://plugins.trac.wordpress.org/changeset/304249):
- Use an XML parser where available (SimpleXML, XML Parser)
- Proper import support for navigation menus
- Many bug fixes, specifically improvements to category and custom taxonomy handling
- Better author/user mapping
Fixes #5447 #5460 #7400 #7973 #8471 #9237 #10319 #11118 #11144 #11354 #11574 #12685 #13364 #13394 #13453 #13454 #13627 #14306 #14442 #14524 #14750 #15055 #15091 #15108.
See #15197.
Don't you mean two or more line breaks?