Make WordPress Core

Opened 11 years ago

Closed 7 years ago

#23307 closed defect (bug) (fixed)

shortcode_parse_atts may return empty string

Reported by: garyj's profile GaryJ Owned by: miqrogroove's profile miqrogroove
Milestone: 4.3 Priority: normal
Severity: minor Version:
Component: Shortcodes Keywords:
Focuses: docs Cc:

Description

See http://core.trac.wordpress.org/browser/tags/3.5.1/wp-includes/shortcodes.php#L243 .

If there are no attributes in a shortcode e.g. [foobar], then shortcode_parse_atts returns an empty string. This is not reflected in the long description or @return tag.

Ideally though, it should return an empty array, so that it's not only consistent, but writing a shortcode function can type hint the $atts argument to be an array e.g.

add_shortcode( 'foo', 'my_shortcode' );
function my_shortcode( array $atts ) {
   ...
}

This might also be combined with checking if the incoming $text is empty, and returning an empty array immediately.

Attachments (4)

23307.diff (593 bytes) - added by GaryJ 11 years ago.
Immediate return of empty array if no attributes given
23307.3.diff (1.9 KB) - added by aaroncampbell 9 years ago.
23307.2.diff (1.9 KB) - added by aaroncampbell 9 years ago.
23307.4.diff (577 bytes) - added by miqrogroove 8 years ago.

Download all attachments as: .zip

Change History (35)

@GaryJ
11 years ago

Immediate return of empty array if no attributes given

#1 @GaryJ
11 years ago

  • Keywords has-patch added

The patch captures when there is no space + characters after the shortcode tag, to return an empty array.

However, an opening tag, plus some characters / combination of characters that don't fall into the $pattern will still currently return as a string. The left-trimmed string could be set as the first element of an array, or cast to an array, but I'm unsure what BC might break from what is presumably an edge case?

#2 @mordauk
10 years ago

+1 for making the response consistent.

#3 follow-up: @nacin
10 years ago

  • Keywords needs-unit-tests added

This needs unit tests. I'd also be curious when this was introduced — what's the history here?

Patch doesn't look complete — if the pattern fails, it'd also return a string ( ltrim( $text ) ). We should make it as consistent as possible.

I don't think there would be a BC break of concern here. The function may also be able to return an empty array if given the right arguments. But I wouldn't expect someone to be asserting a string.

#4 in reply to: ↑ 3 @GaryJ
10 years ago

Replying to nacin:

I'd also be curious when this was introduced — what's the history here?

#5892

Patch doesn't look complete — if the pattern fails, it'd also return a string ( ltrim( $text ) ). We should make it as consistent as possible.

Looking at my comment above, I think this was indeed half a patch, as I wasn't sure what could / would be broken when potentially doing something with that left trim, so I just handled the simple case at the time.

#5 follow-up: @SergeyBiryukov
10 years ago

Related: #26927

#6 in reply to: ↑ 5 @TobiasBg
10 years ago

From #26927:

The problem: This was added in [6939] for #5892 six years ago. :-/
I'm not exactly sure for what cases this was necessary after [6911] for #5914 was already in, but presumably there's some edge case strings that the huge regex in shortcode_parse_atts() does not match.

#7 @TobiasBg
10 years ago

#26927 was marked as a duplicate.

#8 @SergeyBiryukov
10 years ago

  • Milestone changed from Awaiting Review to 3.9

#9 @GaryJ
10 years ago

  • Keywords needs-patch added; has-patch removed

Here's the latest tagged reference: https://core.trac.wordpress.org/browser/tags/3.8.1/src/wp-includes/shortcodes.php#L280

Are we happy with just doing:

$atts[] = ltrim($text);

so that the else returns an array with a single value?

#10 follow-up: @TobiasBg
10 years ago

An array with a single empty string in it feels wrong. Also, empty() checks (which plugins might be doing to catch the empty string that's returned currently) would fail.
The correct return value should be an empty array.

#11 in reply to: ↑ 10 ; follow-up: @GaryJ
10 years ago

Replying to TobiasBg:

An array with a single empty string in it feels wrong. Also, empty() checks (which plugins might be doing to catch the empty string that's returned currently) would fail.
The correct return value should be an empty array.

If the string is empty (i.e. [foobar]), then my attached patch already ensures an empty array is returned.

What that last suggestion covers is the edge case when some non-empty string fails the regex.

#12 in reply to: ↑ 11 ; follow-up: @TobiasBg
10 years ago

Replying to GaryJ:

If the string is empty (i.e. [foobar]), then my attached patch already ensures an empty array is returned.
What that last suggestion covers is the edge case when some non-empty string fails the regex.

Ah, ok. Thanks for the clarification.
Without having actually tested your patch: What happens for [foobar ] or [foobar /]?
And do you happen to have an example where the regex would fail for a non-empty string? I couldn't find one yet.

#13 in reply to: ↑ 12 @GaryJ
10 years ago

Replying to TobiasBg:

Without having actually tested your patch: What happens for [foobar ] or [foobar /]?
And do you happen to have an example where the regex would fail for a non-empty string? I couldn't find one yet.

From the regex in get_shortcode_regex(), both [foobar ] and [foobar /] give $m[3] as ' ' (string containing a single space). It's $m[3] that is provided as the argument to shortcode_parse_atts().

Here's the (slightly modified - see below) regex in a single string (with "foobar" added in place of the tag name variable):

\[(\[?)(foobar)(?![\w-])([^\]\/]*(?:\/(?!\])[^\]\/]*)*?)(?:(\/)\]|\](?:([^\[]*(?:\[(?!\/\2\])[^\[]*)*)\[\/\2\])?)(\]?)


The shortcode_parse_atts() regex allows shortcodes attributes of the form:

  • x=y
  • x =y (optional space before =)
  • x= y (optional space after =)
  • x="y" (use of double-quotes, can have optional spaces around =)
  • x='y' (use of single quotes, can have optional spaces around =)
  • "y" (just attribute value in double quotes)
  • y (non-whitespace character without quotes)

That means that a single or multiple consecutive spaces WILL fail the regex, and why there's a trim in the current else.

So, [foobar ] should act the same as [foobar].

One case where the string wouldn't be empty (if I'm reading everything right) would be something like [foobar "] (single double quote) or [foobar ''] (two single quotes) - they give a string of '"' and '''' which are then run through stripcslashes().


As a side question, the Rad Software Regular Expression Designer that I was testing in reported that the get_shortcode_regex() regex had nested quantifiers and wouldn't run without fixing them. Looking at the original regex, there does seem to be three instances of *+, which seems odd. Can anyone explain that logic?

#14 @nacin
10 years ago

  • Milestone changed from 3.9 to Future Release

Great research so far, GaryJ! Shouldn't be too tough to turn this into some unit tests, which, ultimately, this will need to be considered.

For *+, see http://www.regular-expressions.info/possessive.html.

This ticket was mentioned in Slack in #core by aaroncampbell. View the logs.


9 years ago

#16 @aaroncampbell
9 years ago

  • Keywords has-patch 2nd-opinion added; needs-unit-tests needs-patch removed

My thoughts:
1) I agree that it makes more sense to return an empty array for no attributes.
2) Most people are probably using empty( $atts ) but there might be some places where we break people's tests for empty attributes. I'm thinking this is a pretty minor breakage.
3) One thing we ARE breaking are our own unit tests. We check for ''. My patches switch our tests from assertEquals( '', $atts ) to assertEmpty( $atts ) instead.
4) And lastly, the most important thing, I can't seem to trigger the else in shortcode_parse_atts(). My first patch (23307.2.diff) returns an array in that case, but I couldn't come up with any content to test it with. My second patch (23307.3.diff) removes the else entirely. All unit tests still pass.

#17 @GaryJ
9 years ago

Spent some time checking my earlier research, and I agree with point 4 - once the early trim() is in place, there's nothing that triggers the else. Even the most complicated of values only needs to satisfy \S*.

#18 @GaryJ
9 years ago

Saying that, we have one or two other else / default: instances in core that are commented as "We should never hit this, but ...", so for the sake of one line, I think the else should probably be kept in (and suitably commented), in case there's some odd character with odd encoding that doesn't get removed in a default trim() but fails \S.

#19 @miqrogroove
9 years ago

  • Focuses docs added
  • Keywords needs-patch added; has-patch 2nd-opinion removed
  • Milestone changed from Future Release to 4.4

I am not in favor of changing this. It's baked in and had no traction here.

I would like to see the docblock corrected/updated in 4.4. It should describe any situation where the attributes regex failure causes a string return.

I am working on a roadmap where we might have a separate attribute parser for new shortcodes. It will be easier to make improvements in that future system.


#20 @DrewAPicture
9 years ago

  • Owner set to miqrogroove
  • Status changed from new to assigned

@miqrogroove: Can you please work on a patch describing the issue for the DocBlock change?

#21 follow-up: @miqrogroove
8 years ago

Currently researching the history of attribute name syntax. I had documented this as being restricted to "word" characters, but it looks like that changed to include hyphens at some point.

#22 in reply to: ↑ 21 @aaroncampbell
8 years ago

Replying to miqrogroove:

Currently researching the history of attribute name syntax. I had documented this as being restricted to "word" characters, but it looks like that changed to include hyphens at some point.

Yeah, that happened just a couple months ago in [33118] (ticket #9405). It brought them more in line with shortcode names (#17657) and made them feel a bit more like the HTML attributes that people are used to (data-something, aria-something, etc).

@miqrogroove
8 years ago

#23 @miqrogroove
8 years ago

  • Keywords has-patch added; needs-patch removed

23307.4.diff provides the needed information.

#24 @aaroncampbell
8 years ago

  • Keywords commit added

Just confirming that:
1) I ran some quick tests just to double check, and the documentation in 23307.4.diff is accurate (albeit somewhat confusing)
2) I'm in favor of not messing with this and instead making sure the new system is more consistent (always returning an array)

#25 @miqrogroove
8 years ago

The "separate attribute parser for new shortcodes" idea died with draft one of that roadmap. :) Since there is no longer a plan to have two separate shortcode systems, we need to either wontfix, or we can add this ticket to the roadmap as a planned change.

On the one hand, there were no objections to wontfix at the last meeting. On the other hand, I think the patch adds new information that we can consider. The return behavior did not match the old docs and also did not match the ticket description.

If we do decide to change the return type, I think we should also take the opportunity at the same time to disallow equal signs and quote chars from unnamed, unquoted values. It makes no sense to allow an attribute value like gar=bage"

#26 @wonderboymusic
8 years ago

In 34744:

Shortcodes: clarify the @return docs for shortcode_parse_atts().

Props miqrogroove.
See #23307.

#27 @DrewAPicture
8 years ago

  • Keywords commit removed
  • Milestone changed from 4.4 to Future Release

#28 @miqrogroove
8 years ago

  • Keywords has-patch removed
  • Milestone changed from Future Release to Awaiting Review

It's on the agenda for 7 October meeting at #feature-shortcode. The available patches did not cover all cases and we need to review plugin impact.

#29 @DrewAPicture
8 years ago

@miqrogroove What's the next step here? Are we closing as wontfix or updating the docs?

This ticket was mentioned in Slack in #docs by morganestes. View the logs.


7 years ago

#31 @DrewAPicture
7 years ago

  • Milestone changed from Awaiting Review to 4.3
  • Resolution set to fixed
  • Status changed from assigned to closed

Closing as fixed, citing [34744].

Note: See TracTickets for help on using tickets.