Ticket #10376 (closed enhancement: fixed)
White space should be collapsed after tags are stripped in wp_trim_excerpt
| Reported by: |
|
Owned by: | |
|---|---|---|---|
| Priority: | normal | Milestone: | 3.0 |
| Component: | Formatting | Version: | 2.8.1 |
| Severity: | normal | Keywords: | has-patch wp_trim_excerpt |
| Cc: | scribu@…, ryanmccue@… |
Description
I've a minor edit I'd like to propose: When wp_trim_excerpt strips tags from the post content, it leaves the white space surrounding those tags, and then pulls the first 55 characters, which could be blank. So, for a lot of my posts that had a big set of HTML tags at the beginning, the excerpt would be empty. The work around is easy: collapse the white space (which the browser would do anyway). I hope it seems logical. Diff file attached.
Attachments
Change History
- Keywords has-patch, the_excerpt, wp_trim_excerpt added; has-patch removed
- Cc scribu@… added
- Milestone changed from Future Release to 2.8.2
Wouldn't a simple trim() do the job?
I agree, trim() would be much simpler and probably more efficient. Make sure it's after the strip_tags(), otherwise there could be whitespace in the middle being left there.
No. a trim would not work.
the issue here, is that
$in = "A<test> <test> <test> B"; $in = strip_tags($in); echo $in; //is now 3 spaces between a and b
when you explode by spaced, you end up with spaces being words, ie.
$in = " ";
$in = explode(' ', $in);
var_dump($in);
---
array
0 => string '' (length=0)
1 => string '' (length=0)
2 => string '' (length=0)
3 => string '' (length=0)
This is what i'd use myself to get the words:
$a = " This is a test";
preg_match_all('|\b([^\W]+?)\b|', $a, $words);
$words = $words[1];
result:
array
0 => string 'This' (length=4)
1 => string 'is' (length=2)
2 => string 'a' (length=1)
3 => string 'test' (length=4)
Considering that strip_tags is run after wpautop() and the other filters on 'the_content', the pseudo-excerpt will always be one line text. If we want to avoid normalizing white space for the whole post, perhaps can use preg_split:
$words = preg_split('/\s+/', $text, $excerpt_length + 1, PREG_SPLIT_NO_EMPTY);
Probably shouldn't count on wpautotop() running, as a lot of people (myself included) have it disabled, but preg_split would be just as good, just don't forget to strip tags from $text; otherwise you will probably end up with broken HTML. I've attached a diff file that uses it.
- Milestone changed from 2.8.2 to 2.9
@sojweb: why the move of the strip_tags call in the patch?
comment:10
follow-up:
↓ 11
sojweb — 3 years ago
@Denis-de-Bernardy: You mean why remove it from its own line? It makes for one less variable assignment... I suppose I did it more out of habit than anything else. It doesn't make any real difference.
comment:11
in reply to:
↑ 10
Denis-de-Bernardy — 3 years ago
Replying to sojweb:
@Denis-de-Bernardy: You mean why remove it from its own line? It makes for one less variable assignment... I suppose I did it more out of habit than anything else. It doesn't make any real difference.
It makes things less readable? :-)
D.
comment:12
sojweb — 3 years ago
Readable? l33t h4x0r5 do not need readable code ;-).
comment:13
follow-up:
↓ 14
Denis-de-Bernardy — 3 years ago
Yayaya, and "If it was hard for me to write it, it should be hard for them to understand it!" Still... :-)
comment:14
in reply to:
↑ 13
sojweb — 3 years ago
Replying to Denis-de-Bernardy:
Yayaya, and "If it was hard for me to write it, it should be hard for them to understand it!" Still... :-)
Ha, exactly!
No, I'm totally with you... readability is worth the extra line.
alternative implementation:
http://svn.wp-plugins.org/sem-fancy-excerpt/trunk/sem-fancy-excerpt.php
- Keywords needs-patch added; has-patch removed
btw, the current patch is not utf8-safe.
see http://core.trac.wordpress.org/ticket/11669#comment:17 for the above-mentioned utf8 problem.
- Milestone changed from 3.1 to 3.0
I've added a utf8-safe version of the suggested patch, in case anyone's willing to test this.
comment:23
dd32 — 23 months ago
- Status changed from new to closed
- Resolution set to fixed
comment:24
scribu — 23 months ago
I know it doesn't really matter, but in regex, isn't [\n\r\t ]+ equivalent to \s+ ?
comment:25
scribu — 23 months ago
Nevermind, I saw the comment above.

