Make WordPress Core

Opened 5 years ago

Last modified 5 years ago

#47863 reopened defect (bug)

Fix odd, unexpected output from shortcode_parse_attts

Reported by: mauteri's profile mauteri Owned by: sergeybiryukov's profile SergeyBiryukov
Milestone: Future Release Priority: normal
Severity: normal Version:
Component: Shortcodes Keywords: has-patch has-unit-tests dev-feedback
Focuses: Cc:

Description (last modified by SergeyBiryukov)

When passing a full shortcode to this function, I'm expecting only an array of key/value attributes of the shortcode to be returned, however this is not the case. Basic example (though my diff of unit tests have many), this shortcode:

[unittest title="unittest" link="https://unit.test/"]

will return an array that looks like this:

(
    0 => '[unittest'
    'title' => 'unittest'
    1 => 'link="https://unit.test/"]'
)

rather than one that looks like this

(
    'title' => 'unittest'
    'link' => 'https://unit.test/'
)

I've already created a patch for this. I will put it in two parts. First unit tests with a data provider of 6 unit test format examples. 5 out of 6 will fail. I will then include another patch that includes the unit tests plus the proposed fix to the function.

Attachments (10)

shortcode_parse_attts.patch (1.4 KB) - added by mauteri 5 years ago.
This patch is of failing unit tests
47863.patch (2.2 KB) - added by mauteri 5 years ago.
Patch for issue plus unit tests
47863.1.patch (3.1 KB) - added by mauteri 5 years ago.
Updates to patch per first round of reviews
47863-2.diff (4.1 KB) - added by birgire 5 years ago.
47863-3.patch (4.5 KB) - added by mauteri 5 years ago.
Additional tests and slight code adjustment.
47863-4.patch (4.5 KB) - added by mauteri 5 years ago.
Extra semi-colon in last patch.
47863-5.patch (4.5 KB) - added by mauteri 5 years ago.
47863-6.patch (4.5 KB) - added by mauteri 5 years ago.
47863-7.patch (4.7 KB) - added by mauteri 5 years ago.
Additional test of nested shortcode.
47863-8.patch (4.6 KB) - added by mauteri 5 years ago.

Download all attachments as: .zip

Change History (44)

@mauteri
5 years ago

This patch is of failing unit tests

@mauteri
5 years ago

Patch for issue plus unit tests

#1 @SergeyBiryukov
5 years ago

  • Description modified (diff)

#2 @SergeyBiryukov
5 years ago

  • Milestone changed from Awaiting Review to 5.3
  • Owner set to SergeyBiryukov
  • Status changed from new to reviewing

#3 follow-up: @birgire
5 years ago

Thanks for the patch @mauteri

@SergeyBiryukov I almost posted some reviewing comments, but then noticed you're currently set for reviewing, so I will just wait for your initial review :-)

#4 in reply to: ↑ 3 @SergeyBiryukov
5 years ago

Replying to birgire:

I almost posted some reviewing comments, but then noticed you're currently set for reviewing, so I will just wait for your initial review :-)

I just wanted to note that based on the current usage in core, the function appears to only accept key-value pairs. Extending it to accept a full shortcode makes sense to me and seems like a quick win. That's all I have for now :)

#5 @birgire
5 years ago

  • Keywords has-unit-tests added

Thanks @SergeyBiryukov

Reading the docblock's description of shortcode_parse_atts(),

https://core.trac.wordpress.org/browser/tags/5.2/src/wp-includes/shortcodes.php#L483

it seems to indicate that this should be supported, e.g.:

 * Retrieve all attributes from the shortcodes tag.

So at a first glance, it seems like a good enhancement.

Maybe the docblock could be enhanced as well, e.g. adding a descripton to the * @param string $text part?

Just curious about the \w+ choice in the patch, as I noticed that in get_shortcode_atts_regex()

https://core.trac.wordpress.org/browser/tags/5.2/src/wp-includes/shortcodes.php#L480

we have [\w-]+. I guess these should match?

I didn't find any explicit tests for shortcode_parse_atts(), so it seems good to add them.

In the description of shortcode_parse_atts() we have:

* @return array|string List of attribute values.
*                      Returns empty array if trim( $text ) == '""'.
*                      Returns empty string if trim( $text ) == ''.
*                      All other matches are checked for not empty().

so I would suggest adding these test cases to the data provider in 47863.patch.

And also test cases with - in the shortcode's tag name.

ps:

According to https://make.wordpress.org/core/2019/07/12/php-coding-standards-changes/

Arrays must be declared using long array syntax in WordPress Core.

so I would assume it applying to the tests as well.

Then we would need to add a @ticket 47863 in the docblock of the test method.

Last edited 5 years ago by birgire (previous) (diff)

#6 follow-up: @mauteri
5 years ago

Thanks for the feedback, @birgire. All sounds very reasonable. Curious about the long array syntax since PHP minimum requirement was just raised from 5.2 to 5.6.20 only in last major version. Could that standard now be dated since shorthand arrays in PHP was introduced in PHP 5.4 and maybe the WordPress standard should be updated to reflect what is possible in the new minimum version of PHP?

Also, I found the $text parameter naming to be a bit odd. Just from seeing the name of the function shortcode_parse_atts seems it should ONLY accept shortcodes to have their atts parsed (which made it very odd to me when it messed that up in the first place). Seems a better name for that argument would be $shortcode to make it very clear that a shortcode should be passed to it to have its atts parsed. I'm just thinking of other improvements that could be made if you and others agree.

#7 in reply to: ↑ 6 @SergeyBiryukov
5 years ago

Replying to mauteri:

Could that standard now be dated since shorthand arrays in PHP was introduced in PHP 5.4 and maybe the WordPress standard should be updated to reflect what is possible in the new minimum version of PHP?

The PHP Coding Standards Changes post linked above explains the reasoning:

Short Array Syntax
A little less controversial, but still with varying opinions, was the proposal to require short array syntax ( [ 1, 2, 3 ] ) instead of long array syntax ( array( 1, 2, 3 ) ) for declaring arrays.

While I’m personally partial to short array syntax, there were two particularly convincing arguments for using long array syntax:

  • It’s easier to distinguish from other forms of braces, particularly for those with vision difficulties.
  • It’s much more descriptive for beginners.

So, this change to the coding standards is the opposite of what was originally proposed, but is ultimately the more inclusive option.

Coding Standards Change
Arrays must be declared using long array syntax in WordPress Core.

Also, I found the $text parameter naming to be a bit odd. Just from seeing the name of the function shortcode_parse_atts seems it should ONLY accept shortcodes to have their atts parsed (which made it very odd to me when it messed that up in the first place). Seems a better name for that argument would be $shortcode to make it very clear that a shortcode should be passed to it to have its atts parsed.

Based on the current usage in core (a string of shortcode attributes, rather than a full shortcode), I would keep the $text name as is and clarify in the documentation that it can also be a shortcode. That said, I wouldn't object too strongly to changing the name.

#8 @birgire
5 years ago

I agree with @SergeyBiryukov in keeping the $text, if the documentation is improved.

It also comes to mind that it might be worth mentioning that this is meant for a single shortcode as we would not be able to find what key/value belongs to each shortcode, from the output of e.g. shortcode_parse_atts( '[a b="c"][d e="f"]' ).

ps:

I updated my comment above regarding the @ticket 47863 annotation.

#9 @mauteri
5 years ago

Great, thanks to both of you for the additional info and reasoning.

#10 @mauteri
5 years ago

@birgire this doesn't appear to be true from what I can tell...

* Returns empty string if trim( $text ) == ''.

I wrote a test for this and it failed. I then removed my code changes to see if I introduced a bug and it still failed.

This was my dataProvider for the test:

array(
	'\'\'',
	'',
),
There was 1 failure:

1) Tests_Shortcode::test_shortcode_parse_atts with data set #0 ('''', '')
Failed asserting that Array &0 () is identical to ''.

/srv/www/wordpress-trunk/public_html/tests/phpunit/tests/shortcode.php:1054

Possible this was wrong for a while and no one noticed due to lack of tests?

Curious if this is the case, that this function now only returns arrays and not strings. I can spend more time and see what kind of coverage we have for the function with the tests I've already written.

Added 47863.1.patch with most recent updates, though more changes to DocBlocks will definitely need to be added.

@mauteri
5 years ago

Updates to patch per first round of reviews

@birgire
5 years ago

#11 @birgire
5 years ago

I adjusted the inline docs and some WPCS in 47863-2.diff.

Added two trivial test cases.

I also think I would expect the inputs '""' and '\'\'' to return an empty array, as the inline docs mentioned:

* Returns empty array if trim( $text ) == '""'.

The tests in 47863-2.diff run successfully on my install.

OK (12 tests, 12 assertions)

#12 follow-up: @mauteri
5 years ago

@birgire oh, I think I misread the docblock on the return for ''. This looks to work as expected as I read through your additional tests. Thanks.

#13 in reply to: ↑ 12 @birgire
5 years ago

Replying to mauteri:

@birgire oh, I think I misread the docblock on the return for ''. This looks to work as expected as I read through your additional tests. Thanks.

I think we should also consider cases like:

[unittest link=https://unit.test/ /]
[unittest title https://unit.test/ /]

but then the trimming might remove too much.

What do you think?

#14 @mauteri
5 years ago

Thanks @birgire, good points there. I've adjusted the trimming a bit. Mostly just pulling off any hanging / at the end of the string with a preg_replace before trimming whitespace. I also added a few more tests you mentioned above with what I think the expected output would be given the circumstance. Let me know of you agree or if you have any comments on the additional tests.

@mauteri
5 years ago

Additional tests and slight code adjustment.

@mauteri
5 years ago

Extra semi-colon in last patch.

#15 @birgire
5 years ago

I wonder if we could simplify with e.g.:

if ( preg_match( '#^\[[\w-]+([^\]]*?)\/?\]#', $text, $matches ) ) {
	$text = $matches[1];
}

i.e. use non-greedy *? check to allow for the latter \/? check.

This seems to run the tests successfully.

ps: here's though another kind of edge case:

array(
	'[unittest link=https://unit.test/]Lorem Ipsum[/unittest]',
	array(
		'link'  => 'https://unit.test/',
	),
),

that will not pass.

#16 @mauteri
5 years ago

@birgire yeah, i don't think we are going to win with that edge case :-/ as that can go either way. probably best to assume if it is right before the ] that it was intentionally part of the self closing shortcode. what do you think?

yeah, we can probably tighten this up a bit by extending the preg_match.

@mauteri
5 years ago

@mauteri
5 years ago

#17 @mauteri
5 years ago

@birgire let me know what you think of latest patch. updated regex per your suggestion and simplified to just a trim afterwards. thx!

@mauteri
5 years ago

Additional test of nested shortcode.

#18 @birgire
5 years ago

Thanks for the update @mauteri

I don't think we need the count check part, isn't $matches[1] always set?

The tests also seem to run successfully without the trim().

Personally I prefer to use a different pattern delimiter than /, when there are many slashes and backslashes in the pattern, to make it more readable.

Last edited 5 years ago by birgire (previous) (diff)

@mauteri
5 years ago

#19 @mauteri
5 years ago

@birgire Updated. I'm usually overly cautious with $matches, but I see your point that not needed in this case as it would never get that far without a count of 2. I also removed the superfluous trim and changed the delimiter. Thanks.

#20 @mauteri
5 years ago

@birgire @SergeyBiryukov let me know if there's anything else you'd like updated/changed for this ticket or if it is good now as-is. Thanks!

#21 @kadamwhite
5 years ago

  • Keywords commit added

This looks good to me, provisionally marking for commit (@SergeyBiryukov please override me if you disagree, though!)

#22 @whyisjake
5 years ago

  • Resolution set to fixed
  • Status changed from reviewing to closed

In 46369:

Shortcodes: Improve handling from shortcode_parse_attts().

Ensure consistency between shortcode_parse_attts() when being used directly.

Props mauteri, birgire, SergeyBiryukov, kadamwhite, whyisjake.
Fixes #47863.

#23 @david.binda
5 years ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

I have run into an issue with the new implementation, where the code was either relying on the 0 key containing the shortcode's tag, or where the code was adjusted to the original behaviour and was blindly removing the 0 key.

In both cases, the implementation is broken with this change being in place. While I like this change, there might be more issues with shortcodes once 5.3 is relased, similar to what I've been seeing during my tests.

In case we are okay with such a breaking change, perhaps it might be worth writing a blog post on the breaking change, with some guidance on how to properly adjust the broken code.

Here are some of the example codes I've run into:

Example 1:

$shortcode = '[unittest attr1 attr2]';
$atts = shortcode_parse_atts( $shortcode );
// Drop '[unittest ' part.
array_shift( $atts ); // With r46369 being applied, this removes the attr1 instead of [unittest

Example 2:

$shortcode = '[unittest attr1 attr2]';
$atts = shortcode_parse_atts( $shortcode );
$tag = trim( $attrs[0], '[' ); // With r46369 being applied, the $tag holds attr1 instead of unittest

IMHO, the correct usage for for the both examples above would be following logic:

$shortcode = '[unittest attr1 attr2]';
if ( preg_match( '/'. get_shortcode_regex( [ 'unittest' ] ) .'/s', $shortcode, $matches ) ) {
    $tag = $matches[2];
    $atts = shortcode_parse_atts( $matches[3] );
}

But that's perhaps for a blog post.

#24 @SergeyBiryukov
5 years ago

In 46465:

Docs: Adjust @since note in shortcode_parse_atts() for consistency with similar notes.

See #47863.

#25 @davidbaumwald
5 years ago

  • Milestone changed from 5.3 to Future Release

With 5.3 RC1 releasing today and work still left here, this ticket is being moved to Future Release. If any committer feels this can be worked in to 5.3 before the deadline or can assume ownership in the 5.4 cycle, feel free to move it back up.

#26 @ocean90
5 years ago

  • Keywords dev-feedback added; commit removed
  • Milestone changed from Future Release to 5.3

Moving milestone back to 5.3 since there's already a commit which may cause problems as mentioned in comment:23. This needs to be verified and a decision made whether this is good or should to be reverted for now.

Version 0, edited 5 years ago by ocean90 (next)

This ticket was mentioned in Slack in #core by desrosj. View the logs.


5 years ago

#28 @SergeyBiryukov
5 years ago

In 46554:

Shortcodes: Revert [46369] for now to allow more time to investigate and prepare for backward compatibility changes.

Also reverts follow-up changes in [46370] and [46465].

See #47863.

#29 @SergeyBiryukov
5 years ago

  • Milestone changed from 5.3 to 5.4

A quick search shows that 464 plugins from the repo use the function. Not all of them would be affected, but let's play it safe and try again in 5.4.

#30 @mauteri
5 years ago

What is everyone's thoughts on introducing this code in a new function (something like maybe wp_parse_atts), updating core to use the new function, and deprecating shortcode_parse_atts. That way we can introduce a better, more reliable function and not break backwards compatibility. Thoughts?

This ticket was mentioned in Slack in #core by david.baumwald. View the logs.


5 years ago

#32 @davidbaumwald
5 years ago

  • Milestone changed from 5.4 to Future Release

With 5.4 Beta 3 landing tomorrow, this is being moved to Future Release. If any maintainer or committer feels this can be resolved in time or wishes to assume ownership during a specific cycle, feel free to update the milestone accordingly.

#33 follow-up: @joyously
5 years ago

It doesn't seem to improve anything, and does it even handle a shortcode that encloses content?
Why change what isn't broken? I thought shortcodes were being replaced by blocks, so it's best to just leave this alone.

#34 in reply to: ↑ 33 @mauteri
5 years ago

There's an example in the description of how it returns broken, unexpected results, though due to backwards compatibility and folks writing code to compensate, I proposed in my last comment introducing a new function (wp_parse_atts) to handle things as they are to be expected and deprecate this function.

Replying to joyously:

It doesn't seem to improve anything, and does it even handle a shortcode that encloses content?
Why change what isn't broken? I thought shortcodes were being replaced by blocks, so it's best to just leave this alone.

Note: See TracTickets for help on using tickets.