Make WordPress Core

Opened 14 years ago

Closed 11 years ago

Last modified 11 years ago

#15657 closed defect (bug) (invalid)

wp_strip_all_tags causes paragraphs to run together

Reported by: jwz's profile jwz Owned by:
Milestone: Priority: normal
Severity: normal Version: 2.9
Component: Formatting Keywords: has-patch needs-unit-tests
Focuses: Cc:

Description

If a post contains HTML like "foo<p>bar", the RSS feed ends up with the text "foobar" instead of the more appropriate "foo bar" or even "foo\n\nbar".

Here is a simple patch to wp-includes/formatting.php that fixes this: basically, convert <p>, <br> and certain similar tags to newlines before calling the PHP-builtin strip_tags() function.

Attachments (2)

patch.txt (1.2 KB) - added by jwz 14 years ago.
formatting-wp_strip_all_tags.diff (824 bytes) - added by trepmal 13 years ago.
just wp_strip_all_tags

Download all attachments as: .zip

Change History (15)

@jwz
14 years ago

#1 @nacin
14 years ago

  • Keywords has-patch 3.2-early added
  • Milestone changed from Awaiting Review to Future Release

#2 @jwz
13 years ago

  • Version changed from 3.0.2 to 3.3

Ping?

This is still a problem for me in WordPress 3.3.

If there's a problem with my patch, let's discuss it. Otherwise, I would really like to see this make it into the code!

#3 @SergeyBiryukov
13 years ago

  • Keywords needs-unit-tests added
  • Version changed from 3.3 to 3.0.2

Version number indicates when the bug was initially introduced/reported.

Looks like unit tests are needed here.
http://codex.wordpress.org/Automated_Testing#Writing_Tests

@trepmal
13 years ago

just wp_strip_all_tags

#4 follow-up: @trepmal
13 years ago

My very simple test plugin: http://cl.ly/Cdme

Possible problem/inconsistency with patched function: in both of the following instances, 2 line breaks appear between lorem and ipsum. I imagine this is not the desired outcome...?

lorem<br /> 
ipsum
lorem</p><p>ipsum

#5 @trepmal
13 years ago

  • Version changed from 3.0.2 to 2.9

#6 in reply to: ↑ 4 @jwz
13 years ago

Replying to trepmal:
Right you are, I think this is a better version (compress incoming whitespace, don't leave beginning- and end-of-line spaces when emitting newlines):

function wp_strip_all_tags_patched($string, $remove_breaks = false) {

	$string = preg_replace('/[\r\n\t ]+/', ' ', $string);

	$string = preg_replace( '@<(script|style)[^>]*?>.*?</\\1>@si', '', $string );

	$string = preg_replace( '@ *</?\s*(P|UL|OL|DL|BLOCKQUOTE)\b[^>]*?> *@si', "\n\n", $string );
	$string = preg_replace( '@ *<(BR|DIV|LI|DT|DD|TR|TD|H\d)\b[^>]*?> *@si', "\n", $string );
	$string = preg_replace( "@\n\n\n+@si", "\n\n", $string );

	$string = strip_tags( $string );

	if ( $remove_breaks )
		$string = preg_replace('/[\r\n\t ]+/', ' ', $string);

	return trim( $string );
}

#7 follow-up: @azaozz
13 years ago

'wp_strip_all_tags()' is intended as fix/replacement for PHP's strip_tags(). If we need to pre-process RSS feeds perhaps we should make another function specifically for that.

#8 in reply to: ↑ 7 @jwz
13 years ago

Replying to azaozz:

'wp_strip_all_tags()' is intended as fix/replacement for PHP's strip_tags(). If we need to pre-process RSS feeds perhaps we should make another function specifically for that.

wp_strip_all_tags isn't used solely for RSS and Atom feeds, it's used indirectly by anything that wants the_excerpt and is also used by other plugins, e.g., Simple Facebook Connect uses it when cross-posting to Facebook (since their API only allows plain-text).

It seems to me that in any context where you're converting multi-line HTML to plain-text, converting paragraphs to newlines is an eminently sensible thing to do. I can't imagine why you'd want the original PHP strip_tags behavior at all, frankly.

#9 @joehoyle
11 years ago

  • Resolution set to invalid
  • Status changed from new to closed

It seems to me this ticket is invalid, echoing the opinions of azaozz wp_strip_all_tags is a replacement for strip_tags so should not be handling adding spaces / new line breaks. Marking as closed, re-open if you disagree.

#10 @johnbillion
11 years ago

  • Keywords 3.2-early removed
  • Milestone Future Release deleted

#11 follow-up: @jwz
11 years ago

Wow, it's like you didn't actually read any of the words I said. Thanks.

#12 in reply to: ↑ 11 @joehoyle
11 years ago

Replying to jwz:

Wow, it's like you didn't actually read any of the words I said. Thanks.

I did read the words you wrote - and responded. I don't think the scope of this function should be to add line breaks etc, it's for stripping tags, not formatting data. If you do think some data should be formatted differently in a certain place then I think a new ticket would be best for that with specific examples and suggested fix. This function is not that place.

#13 @markoheijnen
11 years ago

jwz, it's what joe said. striptags means stripping all HTML stuff. It doesn't mean replacing it with something different.

Note: See TracTickets for help on using tickets.