Make WordPress Core

Opened 15 years ago

Closed 14 years ago

Last modified 14 years ago

#10376 closed enhancement (fixed)

White space should be collapsed after tags are stripped in wp_trim_excerpt

Reported by: sojweb's profile sojweb Owned by:
Milestone: 3.0 Priority: normal
Severity: normal Version: 2.8.1
Component: Formatting Keywords: has-patch wp_trim_excerpt
Focuses: Cc:

Description

I've a minor edit I'd like to propose: When wp_trim_excerpt strips tags from the post content, it leaves the white space surrounding those tags, and then pulls the first 55 characters, which could be blank. So, for a lot of my posts that had a big set of HTML tags at the beginning, the excerpt would be empty. The work around is easy: collapse the white space (which the browser would do anyway). I hope it seems logical. Diff file attached.

Attachments (3)

formatting.php.diff (580 bytes) - added by sojweb 15 years ago.
formatting-preg-split.php.diff (668 bytes) - added by sojweb 15 years ago.
10376.2.diff (613 bytes) - added by Denis-de-Bernardy 14 years ago.
untested

Download all attachments as: .zip

Change History (28)

#1 @sojweb
15 years ago

  • Keywords the_excerpt wp_trim_excerpt added

#2 @sojweb
15 years ago

*That should read: "first 55 words" in the description

#3 @scribu
15 years ago

  • Cc scribu@… added
  • Milestone changed from Future Release to 2.8.2

Wouldn't a simple trim() do the job?

#4 @rmccue
15 years ago

I agree, trim() would be much simpler and probably more efficient. Make sure it's after the strip_tags(), otherwise there could be whitespace in the middle being left there.

#5 @rmccue
15 years ago

  • Cc ryanmccue@… added

#6 @dd32
15 years ago

No. a trim would not work.

the issue here, is that

$in = "A<test> <test> <test> B";
$in = strip_tags($in);
echo $in; //is now 3 spaces between a and b

when you explode by spaced, you end up with spaces being words, ie.

$in = "   ";
$in = explode(' ', $in);
var_dump($in);
---
array
  0 => string '' (length=0)
  1 => string '' (length=0)
  2 => string '' (length=0)
  3 => string '' (length=0)

This is what i'd use myself to get the words:

$a = "          This is a test";
preg_match_all('|\b([^\W]+?)\b|', $a, $words);
$words = $words[1];

result:

array
  0 => string 'This' (length=4)
  1 => string 'is' (length=2)
  2 => string 'a' (length=1)
  3 => string 'test' (length=4)

#7 @azaozz
15 years ago

Considering that strip_tags is run after wpautop() and the other filters on 'the_content', the pseudo-excerpt will always be one line text. If we want to avoid normalizing white space for the whole post, perhaps can use preg_split:

$words = preg_split('/\s+/', $text, $excerpt_length + 1, PREG_SPLIT_NO_EMPTY);

#8 @sojweb
15 years ago

Probably shouldn't count on wpautotop() running, as a lot of people (myself included) have it disabled, but preg_split would be just as good, just don't forget to strip tags from $text; otherwise you will probably end up with broken HTML. I've attached a diff file that uses it.

#9 @Denis-de-Bernardy
15 years ago

  • Milestone changed from 2.8.2 to 2.9

@sojweb: why the move of the strip_tags call in the patch?

#10 follow-up: @sojweb
15 years ago

@Denis-de-Bernardy: You mean why remove it from its own line? It makes for one less variable assignment... I suppose I did it more out of habit than anything else. It doesn't make any real difference.

#11 in reply to: ↑ 10 @Denis-de-Bernardy
15 years ago

Replying to sojweb:

@Denis-de-Bernardy: You mean why remove it from its own line? It makes for one less variable assignment... I suppose I did it more out of habit than anything else. It doesn't make any real difference.

It makes things less readable? :-)

D.

#12 @sojweb
15 years ago

Readable? l33t h4x0r5 do not need readable code ;-).

#13 follow-up: @Denis-de-Bernardy
15 years ago

Yayaya, and "If it was hard for me to write it, it should be hard for them to understand it!" Still... :-)

#14 in reply to: ↑ 13 @sojweb
15 years ago

Replying to Denis-de-Bernardy:

Yayaya, and "If it was hard for me to write it, it should be hard for them to understand it!" Still... :-)

Ha, exactly!

No, I'm totally with you... readability is worth the extra line.

#15 @ryan
14 years ago

  • Milestone changed from 2.9 to 3.0

#16 @scribu
14 years ago

  • Keywords the_excerpt removed

#18 @Denis-de-Bernardy
14 years ago

  • Keywords needs-patch added; has-patch removed

btw, the current patch is not utf8-safe.

#20 @Denis-de-Bernardy
14 years ago

  • Milestone changed from 3.0 to 3.1

@Denis-de-Bernardy
14 years ago

untested

#21 @Denis-de-Bernardy
14 years ago

  • Milestone changed from 3.1 to 3.0

I've added a utf8-safe version of the suggested patch, in case anyone's willing to test this.

#22 @Denis-de-Bernardy
14 years ago

  • Keywords has-patch added; needs-patch removed

#23 @dd32
14 years ago

  • Resolution set to fixed
  • Status changed from new to closed

(In [13942]) A better default except, Remove multiple white spaces from the except as well as splitting safely on UTF8 strings. Props Denis-de-Bernardy for the UTF8 split. Fixes #10376

#24 @scribu
14 years ago

I know it doesn't really matter, but in regex, isn't [\n\r\t ]+ equivalent to \s+ ?

#25 @scribu
14 years ago

Nevermind, I saw the comment above.

Note: See TracTickets for help on using tickets.