#28564 closed defect (bug) (fixed)
Shortcode Attributes with HTML Tags no longer working
Reported by: | baden03 | Owned by: | wonderboymusic |
---|---|---|---|
Milestone: | 4.0 | Priority: | high |
Severity: | major | Version: | 4.0 |
Component: | Formatting | Keywords: | has-patch commit wptexturize |
Focuses: | Cc: |
Description
At the following URLs are two test using a shortcode attribute that has html in the value.
On WordPress version 4.0-alpha-20140616 any greater/less-than HTML tags are causing strange behaviour.
http://spacedonkey.de/1176/collapse-o-matic-attribute-test/
On WordPress version 3.9.1 the shortcode attributes render correctly.
http://woo.spacedonkey.de/collapse-o-matic-image-in-title-test/
Is this a bug, or will HTML no longer be accepted within a shortcode's attribute value.
Attachments (6)
Change History (40)
#1
follow-up:
↓ 15
@
10 years ago
- Milestone changed from Awaiting Review to 4.0
- Priority changed from normal to high
- Severity changed from normal to major
#3
follow-up:
↓ 6
@
10 years ago
We will have to revert [28727] if the above behavior is desired. The original problem was that wptexturize was corrupting HTML when it contained what appeared to be shortcode patterns. If we want shortcodes to contain HTML, then we have to go back to corrupting HTML. There really isn't any middle ground because we are either giving one language priority over the other, or we are accepting tag soup with merely okay results.
The ultimate example is what happens if this "expand" shortcode contains HTML with square brackets?
[expand title="<a href='http://example.com/?a[]=12345'>Hello World</a>"]
There is simply no way to parse that expression without recursion or object modeling.
#4
@
10 years ago
Also note, while the double quotes in the examples appear to offer a reliable syntax, those quotes are not parsed by the main regexp in shortcodes.php. In my previous example, the existing shortcode engine (with or without [28727]) parses that phrase into the following shortcode text:
[expand title="<a href='http://example.com/?a[]
This ticket was mentioned in IRC in #wordpress-dev by SergeyBiryukov_. View the logs.
10 years ago
#6
in reply to:
↑ 3
;
follow-up:
↓ 7
@
10 years ago
Replying to miqrogroove:
The ultimate example is what happens if this "expand" shortcode contains HTML with square brackets?
[expand title="<a href='http://example.com/?a[]=12345'>Hello World</a>"]
I think square brackets should be encoded in this case:
[expand title="<a href='http://example.com/?a%5B%5D=12345'>Hello World</a>"]
#7
in reply to:
↑ 6
@
10 years ago
Replying to SergeyBiryukov:
I think square brackets should be encoded in this case:
[expand title="<a href='http://example.com/?a%5B%5D=12345'>Hello World</a>"]
I see your encoding and raise you a CDATA example:
[expand title="Hello [World] Fail"]
#9
@
10 years ago
- Keywords has-patch added; needs-patch removed
Good news. My regexp was more resilient at separating HTML and shortcodes on a left-to-right basis than the old one. We can allow HTML within shortcodes. Shortcodes within shortcodes was the main problem. As azaozz mentioned on #15694, the [
and ]
characters must be encoded as HTML entities any time they appear inside of any shortcode. This is almost never accomplished, which is why it's so tricky to separate the bugs.
In miqro-28564.patch:
- Allow HTML inside of shortcode attributes for wptexturize.
- Continue to ignore the [ char if it appears in any HTML attribute.
- Allow the ] char to corrupt any HTML element that appears after a [ char, as in 3.9.
- Update related patterns.
- Update unit tests.
#11
@
10 years ago
In miqro-28564.3.patch:
- Allow well-formed HTML inside of shortcode attributes for wptexturize.
- Do not recurse any other part of wptexturize. HTML is allowed but ignored.
- Do not allow exotic HTML comments in shortcode attributes.
- Continue to ignore the [ and ] chars if they appear in any HTML attribute.
- Update related patterns.
- Update unit tests.
#13
@
10 years ago
- Owner set to wonderboymusic
- Resolution set to fixed
- Status changed from new to closed
In 28773:
#14
@
10 years ago
- Resolution fixed deleted
- Status changed from closed to reopened
Per IRC (yesterday) — this absolutely needs a security review.
#15
in reply to:
↑ 1
;
follow-ups:
↓ 16
↓ 18
@
10 years ago
The test case in Sergey Biryukov's comment:1 still fails.
Broken in [28727].
To reproduce:
- Register a shortcode:
function footag_func( $atts ) { return "title = {$atts['title']}"; } add_shortcode( 'expand', 'footag_func' );- Create a post with this content:
[expand title="<strong>Strong Test</strong>"][/expand]
.Output:
My updated results.
Output:
- Wordpress 3.9.1:
title = <strong>Strong Test</strong>
. - Trunk after [28773]:
title = <strong>Strong Test</strong>
.
I would also like to consider "well-formed HTML," mentioned in comment:11 and comment:13. The author of the popular jQuery slide show plugin Cycle has released a substantially different plugin called Cycle2. It moves most of the parameters from JS to HTML data attributes. This has brought grief to simple HTML parsers because it encourages the following type of HTML, as seen in the Cycle2 manual, http://jquery.malsup.com/cycle2/demo/non-image.php :
<div class="cycle-slideshow" data-cycle-slides="> div">
While I deplore it personally, everyone seems to agree it is proper HTML. (The data is indicating the CSS selector for a slide show.) I know of plugins at wordpress.org/extend using Cycle2, and at least one that could probably be coerced into executing a shortcode with that type of HTML. Anyway it's a change in behavior and could use a Unit test for whatever the expected result is, like comment:1.
Here's my test for Cycle2 inspired HTML:
...
- Create a post with this content:
[expand title="div > blockquote"][/expand]
.
Output:
- Wordpress 3.9.1:
title = div > blockquote
. - Trunk after [28773]:
title = ”div
.
#17
@
10 years ago
My apologies, you are correct. The comment:1 test case is fixed. (I used the visual editor instead of the text editor.)
- Wordpress 3.9.1:
title = <strong>Strong Test</strong>
. - Trunk after [28773]:
title = <strong>Strong Test</strong>
.
The deplorable HTML test case I wrote is still broken though if we want to allow that sort of thing.
- Wordpress 3.9.1:
title = div > blockquote
. - Trunk after [28773]:
title = ”div
.
#18
in reply to:
↑ 15
;
follow-up:
↓ 20
@
10 years ago
Replying to kitchin:
<div class="cycle-slideshow" data-cycle-slides="> div">
wptexturize() has never supported this syntax. You're welcome to open a new ticket for that issue, but it has nothing to do with the bug reported above.
- Create a post with this content:
[expand title="div > blockquote"][/expand]
.
I'm going to read between the lines here and assume something like this will happen:
[expand title='<div class="cycle-slideshow" data-cycle-slides="> div">'][/expand]
We could find a way to allow this by ignoring extra > chars inside of shortcode attributes, which are not texturized anyway. I don't think it's a good idea at all, but it could be done.
#19
@
10 years ago
Eventually someone will attempt this too:
[expand title='<div class="cycle-slideshow" data-cycle-slides="> div[]">'][/expand]
That's why we have to draw a line on shortcode usage, somewhere.
#20
in reply to:
↑ 18
;
follow-up:
↓ 21
@
10 years ago
Replying to miqrogroove:
wptexturize() has never supported this syntax. You're welcome to open a new ticket for that issue, but it has nothing to do with the bug reported above.
I'm just reporting the change in behavior, I don't know how important it is. "Well-formed HTML" is tricky and I'm not sure can really be handled by lookaheads in regexes, as opposed to tokenizing and switch statements, but I'm probably over my head here.
#21
in reply to:
↑ 20
@
10 years ago
Replying to kitchin:
I'm just reporting the change in behavior, I don't know how important it is. "Well-formed HTML" is tricky and I'm not sure can really be handled by lookaheads in regexes, as opposed to tokenizing and switch statements, but I'm probably over my head here.
So the change in behavior is that HTML that was never valid in wptexturize is now also invalid inside of shortcodes? Can we write this off as undefined behavior? Or is someone actually using shortcodes to avoid wptexturize from handling their invalid HTML?
#22
follow-up:
↓ 24
@
10 years ago
I'm not sure why we're texturizing shortcode attributes at all. The problem in comment:3 is solved by searching left to right for the next ]
, since we later agreed [
and ]
must be escaped within a shortcode.
Real world example... I tested a plugin that uses jQuery Cycle2. It did break, though I doubt anyone is using it in the way that breaks it. But here goes. The shortcode for the Testimonial Rotator plugin is like this:
[testimonial_rotator id="99"]
There is an attribute "div_selector" with default value "> div.slide". So the shortcode above should be equivalent to:
[testimonial_rotator id="99" div_selector="> div.slide"]
In WP 3.9.1, the $content passed to do_shortcode() is exactly that.
In trunk, $content is:
[testimonial_rotator id=”99″ div_selector=”> div.slide”]
Valid html... The jQuery Cycle2 plugin is pretty standard. Its examples use this type of html:
<!DOCTYPE html> <html><head><meta charset="UTF-8"><title>test</title></head><body> <div id="testimonial_rotator_99" data-cycletwo-slides="> div.slide"></div> </body></html>
You can paste that into http://validator.w3.org/#validate_by_input and see it is valid. I don't like it, and I think that Cycle2 should allow an escaped value but it does not. If I recall correctly, the author told me, hey it's valid HTML, and did not understand sometimes people like to simple quick parsing.
What is the issue... Don't have a clear spec for what we're trying to do. Don't know how to address comment:14, Nacin's "needs a security review."
#23
@
10 years ago
Another use case, besides css selectors, is the comparison operator ">". Since the WP API functions allow it as a parameter when searching for posts (via the WP_Meta_Query parameter "meta_compare"), it is conceivable a plugin author would allow it in a search shortcode attribute. Perhaps the plugin could handle it escaped, but if we're escaping [
, ]
, why escape <
, >
?
#24
in reply to:
↑ 22
@
10 years ago
Replying to kitchin:
I'm not sure why we're texturizing shortcode attributes at all. The problem in comment:3 is solved by searching left to right for the next
]
, since we later agreed[
and]
must be escaped within a shortcode.
Um yeah, we didn't agree on that, because in practice that never actually happens. Even most internal uses of shortcodes won't escape attribute text.
There is an attribute "div_selector" with default value "> div.slide". So the shortcode above should be equivalent to:
Okay I got that, but I have to go back to my last point. You can't put that anywhere else in a post. Ever. It's a matter of chance that the shortcode parser even allows this.
#25
follow-up:
↓ 26
@
10 years ago
Here's a different issue, not necessarily well-defined or well-covered by tests. The new regex is more greedy than it looks at first glance:
formatting.php line 206
. '(?:' . '[^\[\]<>]' // Shortcodes do not contain other shortcodes. . '|' . '<.+?>' // HTML elements permitted. Prevents matching ] before >. . ')+'
and the corresponding line 232:
} elseif ( '[' === $first && 1 === preg_match( '/^\[\[?(?:[^\[\]<>]|<.+?>)+\]\]?$/', $curl ) ) { // This is an escaped shortcode delimeter. // Do not texturize.
The intent is surely to match brackets only within html tags. However it also matches brackets when any html tag is present. That's becuase <.+?>
will go back and try to match as if <.+>
.
So all the following tests pass in trunk, including the unintended third one:
array( '[ do texturize "[quote]" here ]', '[ do texturize “[quote]” here ]', ), array( '[ regex catches this <a href="[quote]">here</a> ]', '[ regex catches this <a href="[quote]">here</a> ]', ), array( '[ but also catches the <b>styled "[quote]" here</b> ]', '[ but also catches the <b>styled "[quote]" here</b> ]', ),
You can drop them into tests/phpunit/tests/formatting/WPTexturize.php line 1609 to see, though I tested without the full harness.
(Quick edit: in my opinion the first test should fail, but I've already argued that.)
#26
in reply to:
↑ 25
@
10 years ago
Replying to kitchin:
The new regex is more greedy than it looks at first glance
Patch attached for that. Easy tweak.
#27
@
10 years ago
It's worth summarizing the big picture, I hope accurately.
We parse for shortcodes several times.
- The first parse is when texturizing. To fix longstanding bugs, [28727] improved finding shortcodes within the context of the surrounding html.
- The second parse is to expand the shortcodes. The regex pattern here has not changed. It is somewhat different than the regex in step 1, but sane text should work in both.
The pattern in step 1 is aware of <
, >
, but unaware of /
, and unaware of registered shortcode names. The pattern in step 2 has the opposite qualities.
The step 2 pattern is actually used twice, first to unwrap <p>...</p>
around standalone shortcodes.
This ticket was mentioned in IRC in #wordpress-dev by wonderboymusic. View the logs.
10 years ago
#30
@
10 years ago
Failing tests:
1) Tests_Formatting_WPTexturize::test_translate with data set #36 ('[ do texturize "[quote]" here ]', '[ do texturize “[quote]” here ]') Failed asserting that two strings are equal. --- Expected +++ Actual @@ @@ -'[ do texturize “[quote]” here ]' +'[ do texturize !openq2![quote]!closeq2! here ]' /Users/205410/Sites/wordpress-core-develop/tests/phpunit/tests/formatting/WPTexturize.php:1430 2) Tests_Formatting_WPTexturize::test_translate with data set #38 ('[ but also catches the <b>styled "[quote]" here</b> ]', '[ but also catches the <b>styled “[quote]” here</b> ]') Failed asserting that two strings are equal. --- Expected +++ Actual @@ @@ -'[ but also catches the <b>styled “[quote]” here</b> ]' +'[ but also catches the <b>styled !openq2![quote]!closeq2! here</b> ]' /Users/205410/Sites/wordpress-core-develop/tests/phpunit/tests/formatting/WPTexturize.php:1430 3) Tests_Formatting_WPTexturize::test_translate with data set #39 ('[Let\'s get crazy<input>[plugin code="<a href=\'?a[]=100\'>hello</a>"]</input>world]', '[Let’s get crazy<input>[plugin code="<a href=\'?a[]=100\'>hello</a>"]</input>world]') Failed asserting that two strings are equal. --- Expected +++ Actual @@ @@ -'[Let’s get crazy<input>[plugin code="<a href='?a[]=100'>hello</a>"]</input>world]' +'[Let!apos!s get crazy<input>[plugin code="<a href='?a[]=100'>hello</a>"]</input>world]' /Users/205410/Sites/wordpress-core-develop/tests/phpunit/tests/formatting/WPTexturize.php:1430
Broken in [28727].
To reproduce:
[expand title="<strong>Strong Test</strong>"][/expand]
.Output:
title = <strong>Strong Test</strong>
.title = »<strong>Strong
.