#15657 closed defect (bug) (invalid)
wp_strip_all_tags causes paragraphs to run together
Reported by: | jwz | Owned by: | |
---|---|---|---|
Milestone: | Priority: | normal | |
Severity: | normal | Version: | 2.9 |
Component: | Formatting | Keywords: | has-patch needs-unit-tests |
Focuses: | Cc: |
Description
If a post contains HTML like "foo<p>bar", the RSS feed ends up with the text "foobar" instead of the more appropriate "foo bar" or even "foo\n\nbar".
Here is a simple patch to wp-includes/formatting.php that fixes this: basically, convert <p>, <br> and certain similar tags to newlines before calling the PHP-builtin strip_tags() function.
Attachments (2)
Change History (15)
#1
@
14 years ago
- Keywords has-patch 3.2-early added
- Milestone changed from Awaiting Review to Future Release
#3
@
13 years ago
- Keywords needs-unit-tests added
- Version changed from 3.3 to 3.0.2
Version number indicates when the bug was initially introduced/reported.
Looks like unit tests are needed here.
http://codex.wordpress.org/Automated_Testing#Writing_Tests
#4
follow-up:
↓ 6
@
13 years ago
My very simple test plugin: http://cl.ly/Cdme
Possible problem/inconsistency with patched function: in both of the following instances, 2 line breaks appear between lorem and ipsum. I imagine this is not the desired outcome...?
lorem<br /> ipsum
lorem</p><p>ipsum
#6
in reply to:
↑ 4
@
13 years ago
Replying to trepmal:
Right you are, I think this is a better version (compress incoming whitespace, don't leave beginning- and end-of-line spaces when emitting newlines):
function wp_strip_all_tags_patched($string, $remove_breaks = false) { $string = preg_replace('/[\r\n\t ]+/', ' ', $string); $string = preg_replace( '@<(script|style)[^>]*?>.*?</\\1>@si', '', $string ); $string = preg_replace( '@ *</?\s*(P|UL|OL|DL|BLOCKQUOTE)\b[^>]*?> *@si', "\n\n", $string ); $string = preg_replace( '@ *<(BR|DIV|LI|DT|DD|TR|TD|H\d)\b[^>]*?> *@si', "\n", $string ); $string = preg_replace( "@\n\n\n+@si", "\n\n", $string ); $string = strip_tags( $string ); if ( $remove_breaks ) $string = preg_replace('/[\r\n\t ]+/', ' ', $string); return trim( $string ); }
#7
follow-up:
↓ 8
@
13 years ago
'wp_strip_all_tags()' is intended as fix/replacement for PHP's strip_tags()
. If we need to pre-process RSS feeds perhaps we should make another function specifically for that.
#8
in reply to:
↑ 7
@
13 years ago
Replying to azaozz:
'wp_strip_all_tags()' is intended as fix/replacement for PHP's
strip_tags()
. If we need to pre-process RSS feeds perhaps we should make another function specifically for that.
wp_strip_all_tags
isn't used solely for RSS and Atom feeds, it's used indirectly by anything that wants the_excerpt
and is also used by other plugins, e.g., Simple Facebook Connect uses it when cross-posting to Facebook (since their API only allows plain-text).
It seems to me that in any context where you're converting multi-line HTML to plain-text, converting paragraphs to newlines is an eminently sensible thing to do. I can't imagine why you'd want the original PHP strip_tags
behavior at all, frankly.
#9
@
11 years ago
- Resolution set to invalid
- Status changed from new to closed
It seems to me this ticket is invalid, echoing the opinions of azaozz wp_strip_all_tags
is a replacement for strip_tags
so should not be handling adding spaces / new line breaks. Marking as closed, re-open if you disagree.
#11
follow-up:
↓ 12
@
11 years ago
Wow, it's like you didn't actually read any of the words I said. Thanks.
#12
in reply to:
↑ 11
@
11 years ago
Replying to jwz:
Wow, it's like you didn't actually read any of the words I said. Thanks.
I did read the words you wrote - and responded. I don't think the scope of this function should be to add line breaks etc, it's for stripping tags, not formatting data. If you do think some data should be formatted differently in a certain place then I think a new ticket would be best for that with specific examples and suggested fix. This function is not that place.
Ping?
This is still a problem for me in WordPress 3.3.
If there's a problem with my patch, let's discuss it. Otherwise, I would really like to see this make it into the code!