WordPress.org

Make WordPress Core

Opened 6 years ago

Last modified 2 weeks ago

#9405 reopened defect (bug)

Shortcode parameter names can't have dashes in them

Reported by: Viper007Bond Owned by:
Milestone: 4.3 Priority: lowest
Severity: minor Version: 2.8
Component: Shortcodes Keywords: has-patch
Focuses: Cc:

Description

I expected this to work, but it doesn't:

[tag my-param="value" foo="bar"]Hello world![/tag]

The passed array looks like this:

array( 
	0 => 'my-param="value"',
	'foo' => 'bar'
);

Attachments (4)

9405.diff (748 bytes) - added by izem 12 months ago.
Updated regex of shortcode_parse_atts function at src/wp-includes/shortcodes.php
9405.2.diff (768 bytes) - added by izem 12 months ago.
Updated regex for 3 types values (double quotes, single quotes, no quotes)
9405.3.diff (1.6 KB) - added by aaroncampbell 8 months ago.
9405.4.diff (1.8 KB) - added by tyxla 2 weeks ago.
Updating the patch by @aaroncampbell to add support for multiple dashes in the middle of the attribute, as well as multiple consecutive dashes.

Download all attachments as: .zip

Change History (21)

comment:1 @hakre6 years ago

function is shortcode_parse_atts() in wp-includes/shortcodes.php

the parsing is done by regular expressions.

word limiters are used, I guess dashes are not supported by them.

I wonder if it makes sense. This problem can for easier besolved by clarifying the documentation, the regular expression pattern is already quite complex.

comment:2 @Denis-de-Bernardy6 years ago

it's supposed to "look" like html attributes. curious to know if dash is a valid character for html attributes before closing as invalid...

comment:3 @hakre6 years ago

Some HTML Trivia:

  • attribute names are not case sensitive. (HTML 2.0)
  • The `NAMELEN' parameter in the SGML declaration (section SGML Declaration for HTML) limits the length of an attribute value to 1024 characters.
  • HTML allows CDATA attributes to be unquoted provided the attribute value contains only letters (a to z and A to Z), digits (0 to 9), hyphens (ASCII decimal 45) or, periods (ASCII decimal 46)
  • An attribute specification consists of a name, an "=" and a value specification. The name refers to an item in an ATTLIST declaration.

I see no attribute name in HTML (2.0, 3.2, 4.1 & xhtml 1.0) that conatins a dash or dot.

comment:4 @Denis-de-Bernardy6 years ago

  • Milestone Future Release deleted
  • Resolution set to invalid
  • Status changed from new to closed

comment:5 @sswells12 months ago

  • Component changed from General to Shortcodes
  • Resolution invalid deleted
  • Status changed from closed to reopened

Since the reasoning behind closing this ticket was the html attribute mirroring, can it be reconsidered now that data-uid and other data-? parameters are widely used?

Line 308 of shortcodes.php could be changed from:

$pattern = '/(\w+)\s*=\s*"([^"]*)"(?:\s|$)|(\w+)\s*=\s*\'([^\']*)\'(?:\s|$)|(\w+)\s*=\s*([^\s\'"]+)(?:\s|$)|"([^"]*)"(?:\s|$)|(\S+)(?:\s|$)/';

to:

$pattern = '/([\w\-]+)\s*=\s*"([^"]*)"(?:\s|$)|([\w\-]+)\s*=\s*\'([^\']*)\'(?:\s|$)|([\w\-]+)\s*=\s*([^\s\'"]+)(?:\s|$)|"([^"]*)"(?:\s|$)|(\S+)(?:\s|$)/';
Last edited 11 months ago by SergeyBiryukov (previous) (diff)

@izem12 months ago

Updated regex of shortcode_parse_atts function at src/wp-includes/shortcodes.php

comment:6 @izem12 months ago

  • Keywords has-patch added; needs-patch removed

Added a patch that update the regex pattern of shortcode_parse_atts function at src/wp-includes/shortcodes.php
The name part of the regex was: (\w+)
I've changed it to: (\w+(?:\-\w+)*)
Now it support names with hyphens (but not names that end with a hyphen).

// [sc_test name="val01" name-with-hyphens="val02" name-end-with-hyphen-="val03"]
atts Array:
(
    [name] => val01
    [name-with-hyphens] => val02
    [0] => name-end-with-hyphen-="val03"
)

@izem12 months ago

Updated regex for 3 types values (double quotes, single quotes, no quotes)

comment:7 @SergeyBiryukov11 months ago

  • Milestone set to Awaiting Review

@aaroncampbell8 months ago

comment:8 @aaroncampbell8 months ago

  • Keywords needs-testing added
  • Milestone changed from Awaiting Review to Future Release

In #17657 we moved to allowing hyphens in shortcode names. I don't actually see a reason to limit hyphens from being at the beginning of a parameter name, so in 9405.3.diff I went with a simpler regex change that allows [\w-] for all parameter name characters. I also added a unit test.

comment:9 @aaroncampbell2 weeks ago

Just updating to say that I just re-applied and tested the patch again. It still applies and works as expected. If anyone else could test it, it would be much appreciated.

comment:10 @slackbot2 weeks ago

This ticket was mentioned in Slack in #core by aaroncampbell. View the logs.

comment:11 @tyxla2 weeks ago

  • Keywords needs-unit-tests added

I believe some unit tests would be the best way to test this.

comment:12 @aaroncampbell2 weeks ago

  • Keywords needs-unit-tests removed

There's a unit test in the most recent patch. It covers all the parameters in this shortcode:
[test-shortcode-tag foo="foo" foo-bar="foo-bar" foo-bar-="foo-bar-" -foo-bar="-foo-bar" -foo-bar-="-foo-bar-" /]
Do you think there are more that need to be added?

comment:13 @tyxla2 weeks ago

It appears that the recent patch from @aaroncampbell has quite an extensive test coverage.
The unit test works fine for me, and manual testing also works great.

I believe we can add several more to the test, just to cover the cases with two dashes in the middle and two consecutive dashes.

Patch coming up.

@tyxla2 weeks ago

Updating the patch by @aaroncampbell to add support for multiple dashes in the middle of the attribute, as well as multiple consecutive dashes.

comment:14 @tyxla2 weeks ago

  • Keywords needs-testing removed

comment:15 follow-up: @azaozz2 weeks ago

I'm not sure this is a good idea. Why would we want dashes in shortcode attribute names, what (critical) problem is solved with that?

The shortcode regexps are pretty big/slow already, making them even more complex is undesirable imho. Also, imagine the billions of times this slower regexp will run every day. Is it really worth the extra kWh of electricity for the servers so we can have slightly fancier attribute names (that few would ever use)?

Last edited 2 weeks ago by azaozz (previous) (diff)

comment:16 in reply to: ↑ 15 @aaroncampbell2 weeks ago

Replying to azaozz:

I'm not sure this is a good idea. Why would we want dashes in shortcode attribute names, what (critical) problem is solved with that?

I'm not sure it's "critical" but I do think it's a problem that should be solved. When you create a shortcode, try to use a hyphenated parameter name, and it doesn't work...it seems broke. These are only getting more popular now with modern HTML using things like data-*, aria-*, etc. There are certainly characters we don't need to support here, but I think that [\w-] is not only understandable, but expected.


The shortcode regexps are pretty big/slow already, making them even more complex is undesirable imho. Also, imagine the billions of times this slower regexp will run every day. Is it really worth the extra kWh of electricity for the servers so we can have slightly fancier attribute names (that few would ever use)?

This actually doesn't affect the main shortcode regex (the one we've struggled with keeping memory-friendly). That one actually got support for hyphens almost three years ago in #17657. It only affects the shortcode attributes regex, which is only ever run on the content inside the opening shortcode tag (match 3 from the shortcode regex). Since the change is one that affects only the attributes regex, and since the only change to that regex it to use [\w-] instead of \w for attribute names, I think it's a reasonable move.

Last edited 2 weeks ago by aaroncampbell (previous) (diff)

comment:17 @johnbillion2 weeks ago

  • Milestone changed from Future Release to 4.3

Hyphens should definitely be allowed in shortcode attribute names.

I bet the kWhs of electricity saved by reducing the time developers spend at their computers scratching their heads will more than offset the extra kWhs of electricity spent on parsing the updated regex.

Note: See TracTickets for help on using tickets.