WordPress.org

Make WordPress Core

Opened 3 years ago

Closed 6 months ago

Last modified 6 months ago

#15657 closed defect (bug) (invalid)

wp_strip_all_tags causes paragraphs to run together

Reported by: jwz Owned by:
Milestone: Priority: normal
Severity: normal Version: 2.9
Component: Formatting Keywords: has-patch needs-unit-tests
Focuses: Cc:

Description

If a post contains HTML like "foo<p>bar", the RSS feed ends up with the text "foobar" instead of the more appropriate "foo bar" or even "foo\n\nbar".

Here is a simple patch to wp-includes/formatting.php that fixes this: basically, convert <p>, <br> and certain similar tags to newlines before calling the PHP-builtin strip_tags() function.

Attachments (2)

patch.txt (1.2 KB) - added by jwz 3 years ago.
formatting-wp_strip_all_tags.diff (824 bytes) - added by trepmal 2 years ago.
just wp_strip_all_tags

Download all attachments as: .zip

Change History (15)

jwz3 years ago

comment:1 nacin3 years ago

  • Keywords has-patch 3.2-early added
  • Milestone changed from Awaiting Review to Future Release

comment:2 jwz2 years ago

  • Version changed from 3.0.2 to 3.3

Ping?

This is still a problem for me in WordPress 3.3.

If there's a problem with my patch, let's discuss it. Otherwise, I would really like to see this make it into the code!

comment:3 SergeyBiryukov2 years ago

  • Keywords needs-unit-tests added
  • Version changed from 3.3 to 3.0.2

Version number indicates when the bug was initially introduced/reported.

Looks like unit tests are needed here.
http://codex.wordpress.org/Automated_Testing#Writing_Tests

trepmal2 years ago

just wp_strip_all_tags

comment:4 follow-up: trepmal2 years ago

My very simple test plugin: http://cl.ly/Cdme

Possible problem/inconsistency with patched function: in both of the following instances, 2 line breaks appear between lorem and ipsum. I imagine this is not the desired outcome...?

lorem<br /> 
ipsum
lorem</p><p>ipsum

comment:5 trepmal2 years ago

  • Version changed from 3.0.2 to 2.9

comment:6 in reply to: ↑ 4 jwz2 years ago

Replying to trepmal:
Right you are, I think this is a better version (compress incoming whitespace, don't leave beginning- and end-of-line spaces when emitting newlines):

function wp_strip_all_tags_patched($string, $remove_breaks = false) {

	$string = preg_replace('/[\r\n\t ]+/', ' ', $string);

	$string = preg_replace( '@<(script|style)[^>]*?>.*?</\\1>@si', '', $string );

	$string = preg_replace( '@ *</?\s*(P|UL|OL|DL|BLOCKQUOTE)\b[^>]*?> *@si', "\n\n", $string );
	$string = preg_replace( '@ *<(BR|DIV|LI|DT|DD|TR|TD|H\d)\b[^>]*?> *@si', "\n", $string );
	$string = preg_replace( "@\n\n\n+@si", "\n\n", $string );

	$string = strip_tags( $string );

	if ( $remove_breaks )
		$string = preg_replace('/[\r\n\t ]+/', ' ', $string);

	return trim( $string );
}

comment:7 follow-up: azaozz2 years ago

'wp_strip_all_tags()' is intended as fix/replacement for PHP's strip_tags(). If we need to pre-process RSS feeds perhaps we should make another function specifically for that.

comment:8 in reply to: ↑ 7 jwz2 years ago

Replying to azaozz:

'wp_strip_all_tags()' is intended as fix/replacement for PHP's strip_tags(). If we need to pre-process RSS feeds perhaps we should make another function specifically for that.

wp_strip_all_tags isn't used solely for RSS and Atom feeds, it's used indirectly by anything that wants the_excerpt and is also used by other plugins, e.g., Simple Facebook Connect uses it when cross-posting to Facebook (since their API only allows plain-text).

It seems to me that in any context where you're converting multi-line HTML to plain-text, converting paragraphs to newlines is an eminently sensible thing to do. I can't imagine why you'd want the original PHP strip_tags behavior at all, frankly.

comment:9 joehoyle6 months ago

  • Resolution set to invalid
  • Status changed from new to closed

It seems to me this ticket is invalid, echoing the opinions of azaozz wp_strip_all_tags is a replacement for strip_tags so should not be handling adding spaces / new line breaks. Marking as closed, re-open if you disagree.

comment:10 johnbillion6 months ago

  • Keywords 3.2-early removed
  • Milestone Future Release deleted

comment:11 follow-up: jwz6 months ago

Wow, it's like you didn't actually read any of the words I said. Thanks.

comment:12 in reply to: ↑ 11 joehoyle6 months ago

Replying to jwz:

Wow, it's like you didn't actually read any of the words I said. Thanks.

I did read the words you wrote - and responded. I don't think the scope of this function should be to add line breaks etc, it's for stripping tags, not formatting data. If you do think some data should be formatted differently in a certain place then I think a new ticket would be best for that with specific examples and suggested fix. This function is not that place.

comment:13 markoheijnen6 months ago

jwz, it's what joe said. striptags means stripping all HTML stuff. It doesn't mean replacing it with something different.

Note: See TracTickets for help on using tickets.