WordPress.org

Make WordPress Core

Opened 6 years ago

Closed 5 years ago

Last modified 5 years ago

#10376 closed enhancement (fixed)

White space should be collapsed after tags are stripped in wp_trim_excerpt

Reported by: sojweb Owned by:
Milestone: 3.0 Priority: normal
Severity: normal Version: 2.8.1
Component: Formatting Keywords: has-patch wp_trim_excerpt
Focuses: Cc:

Description

I've a minor edit I'd like to propose: When wp_trim_excerpt strips tags from the post content, it leaves the white space surrounding those tags, and then pulls the first 55 characters, which could be blank. So, for a lot of my posts that had a big set of HTML tags at the beginning, the excerpt would be empty. The work around is easy: collapse the white space (which the browser would do anyway). I hope it seems logical. Diff file attached.

Attachments (3)

formatting.php.diff (580 bytes) - added by sojweb 6 years ago.
formatting-preg-split.php.diff (668 bytes) - added by sojweb 6 years ago.
10376.2.diff (613 bytes) - added by Denis-de-Bernardy 6 years ago.
untested

Download all attachments as: .zip

Change History (28)

comment:1 @sojweb6 years ago

  • Keywords the_excerpt wp_trim_excerpt added

@sojweb6 years ago

comment:2 @sojweb6 years ago

*That should read: "first 55 words" in the description

comment:3 @scribu6 years ago

  • Cc scribu@… added
  • Milestone changed from Future Release to 2.8.2

Wouldn't a simple trim() do the job?

comment:4 @rmccue6 years ago

I agree, trim() would be much simpler and probably more efficient. Make sure it's after the strip_tags(), otherwise there could be whitespace in the middle being left there.

comment:5 @rmccue6 years ago

  • Cc ryanmccue@… added

comment:6 @dd326 years ago

No. a trim would not work.

the issue here, is that

$in = "A<test> <test> <test> B";
$in = strip_tags($in);
echo $in; //is now 3 spaces between a and b

when you explode by spaced, you end up with spaces being words, ie.

$in = "   ";
$in = explode(' ', $in);
var_dump($in);
---
array
  0 => string '' (length=0)
  1 => string '' (length=0)
  2 => string '' (length=0)
  3 => string '' (length=0)

This is what i'd use myself to get the words:

$a = "          This is a test";
preg_match_all('|\b([^\W]+?)\b|', $a, $words);
$words = $words[1];

result:

array
  0 => string 'This' (length=4)
  1 => string 'is' (length=2)
  2 => string 'a' (length=1)
  3 => string 'test' (length=4)

comment:7 @azaozz6 years ago

Considering that strip_tags is run after wpautop() and the other filters on 'the_content', the pseudo-excerpt will always be one line text. If we want to avoid normalizing white space for the whole post, perhaps can use preg_split:

$words = preg_split('/\s+/', $text, $excerpt_length + 1, PREG_SPLIT_NO_EMPTY);

comment:8 @sojweb6 years ago

Probably shouldn't count on wpautotop() running, as a lot of people (myself included) have it disabled, but preg_split would be just as good, just don't forget to strip tags from $text; otherwise you will probably end up with broken HTML. I've attached a diff file that uses it.

comment:9 @Denis-de-Bernardy6 years ago

  • Milestone changed from 2.8.2 to 2.9

@sojweb: why the move of the strip_tags call in the patch?

comment:10 follow-up: @sojweb6 years ago

@Denis-de-Bernardy: You mean why remove it from its own line? It makes for one less variable assignment... I suppose I did it more out of habit than anything else. It doesn't make any real difference.

comment:11 in reply to: ↑ 10 @Denis-de-Bernardy6 years ago

Replying to sojweb:

@Denis-de-Bernardy: You mean why remove it from its own line? It makes for one less variable assignment... I suppose I did it more out of habit than anything else. It doesn't make any real difference.

It makes things less readable? :-)

D.

comment:12 @sojweb6 years ago

Readable? l33t h4x0r5 do not need readable code ;-).

comment:13 follow-up: @Denis-de-Bernardy6 years ago

Yayaya, and "If it was hard for me to write it, it should be hard for them to understand it!" Still... :-)

comment:14 in reply to: ↑ 13 @sojweb6 years ago

Replying to Denis-de-Bernardy:

Yayaya, and "If it was hard for me to write it, it should be hard for them to understand it!" Still... :-)

Ha, exactly!

No, I'm totally with you... readability is worth the extra line.

comment:15 @ryan6 years ago

  • Milestone changed from 2.9 to 3.0

comment:16 @scribu6 years ago

  • Keywords the_excerpt removed

comment:18 @Denis-de-Bernardy6 years ago

  • Keywords needs-patch added; has-patch removed

btw, the current patch is not utf8-safe.

comment:20 @Denis-de-Bernardy6 years ago

  • Milestone changed from 3.0 to 3.1

@Denis-de-Bernardy6 years ago

untested

comment:21 @Denis-de-Bernardy6 years ago

  • Milestone changed from 3.1 to 3.0

I've added a utf8-safe version of the suggested patch, in case anyone's willing to test this.

comment:22 @Denis-de-Bernardy6 years ago

  • Keywords has-patch added; needs-patch removed

comment:23 @dd325 years ago

  • Resolution set to fixed
  • Status changed from new to closed

(In [13942]) A better default except, Remove multiple white spaces from the except as well as splitting safely on UTF8 strings. Props Denis-de-Bernardy for the UTF8 split. Fixes #10376

comment:24 @scribu5 years ago

I know it doesn't really matter, but in regex, isn't [\n\r\t ]+ equivalent to \s+ ?

comment:25 @scribu5 years ago

Nevermind, I saw the comment above.

Note: See TracTickets for help on using tickets.