Make WordPress Core

Opened 10 years ago

Closed 10 years ago

Last modified 10 years ago

#28564 closed defect (bug) (fixed)

Shortcode Attributes with HTML Tags no longer working

Reported by: baden03's profile baden03 Owned by: wonderboymusic's profile wonderboymusic
Milestone: 4.0 Priority: high
Severity: major Version: 4.0
Component: Formatting Keywords: has-patch commit wptexturize
Focuses: Cc:

Description

At the following URLs are two test using a shortcode attribute that has html in the value.

On WordPress version 4.0-alpha-20140616 any greater/less-than HTML tags are causing strange behaviour.
http://spacedonkey.de/1176/collapse-o-matic-attribute-test/

On WordPress version 3.9.1 the shortcode attributes render correctly.
http://woo.spacedonkey.de/collapse-o-matic-image-in-title-test/

Is this a bug, or will HTML no longer be accepted within a shortcode's attribute value.

Attachments (6)

miqro-28564.patch (2.5 KB) - added by miqrogroove 10 years ago.
miqro-28564.2.patch (2.9 KB) - added by miqrogroove 10 years ago.
Adds one more unit test.
miqro-28564.3.patch (2.7 KB) - added by miqrogroove 10 years ago.
Allow shortcodes and HTML to alternate semi-recursively.
miqro-28564-part2.patch (1.9 KB) - added by miqrogroove 10 years ago.
Remove lazy quantifiers. Add more unit tests.
miqro-28564-part2.2.patch (2.6 KB) - added by miqrogroove 10 years ago.
miqro-28564-part3.patch (865 bytes) - added by miqrogroove 10 years ago.
Fix copypasta errors in unit tests.

Download all attachments as: .zip

Change History (40)

#1 follow-up: @SergeyBiryukov
10 years ago

  • Milestone changed from Awaiting Review to 4.0
  • Priority changed from normal to high
  • Severity changed from normal to major

Broken in [28727].

To reproduce:

  1. Register a shortcode:
    function footag_func( $atts ) {
         return "foo = {$atts['foo']}";
    }
    add_shortcode('footag', 'footag_func');
    
  2. Create a post with this content: [expand title="<strong>Strong Test</strong>"][/expand].

Output:

  • Before [28727]: title = <strong>Strong Test</strong>.
  • After [28727]: title = &#187;<strong>Strong.
Version 0, edited 10 years ago by SergeyBiryukov (next)

#2 @SergeyBiryukov
10 years ago

  • Component changed from Shortcodes to Formatting
  • Keywords needs-patch added

#3 follow-up: @miqrogroove
10 years ago

We will have to revert [28727] if the above behavior is desired. The original problem was that wptexturize was corrupting HTML when it contained what appeared to be shortcode patterns. If we want shortcodes to contain HTML, then we have to go back to corrupting HTML. There really isn't any middle ground because we are either giving one language priority over the other, or we are accepting tag soup with merely okay results.

The ultimate example is what happens if this "expand" shortcode contains HTML with square brackets?

[expand title="<a href='http://example.com/?a[]=12345'>Hello World</a>"]

There is simply no way to parse that expression without recursion or object modeling.

#4 @miqrogroove
10 years ago

Also note, while the double quotes in the examples appear to offer a reliable syntax, those quotes are not parsed by the main regexp in shortcodes.php. In my previous example, the existing shortcode engine (with or without [28727]) parses that phrase into the following shortcode text:

[expand title="<a href='http://example.com/?a[]

Last edited 10 years ago by miqrogroove (previous) (diff)

This ticket was mentioned in IRC in #wordpress-dev by SergeyBiryukov_. View the logs.


10 years ago

#6 in reply to: ↑ 3 ; follow-up: @SergeyBiryukov
10 years ago

Replying to miqrogroove:

The ultimate example is what happens if this "expand" shortcode contains HTML with square brackets?

[expand title="<a href='http://example.com/?a[]=12345'>Hello World</a>"]

I think square brackets should be encoded in this case:

[expand title="<a href='http://example.com/?a%5B%5D=12345'>Hello World</a>"]

#7 in reply to: ↑ 6 @miqrogroove
10 years ago

Replying to SergeyBiryukov:

I think square brackets should be encoded in this case:

[expand title="<a href='http://example.com/?a%5B%5D=12345'>Hello World</a>"]

I see your encoding and raise you a CDATA example:

[expand title="Hello [World] Fail"]

#8 @miqrogroove
10 years ago

That's #15694 by the way.

#9 @miqrogroove
10 years ago

  • Keywords has-patch added; needs-patch removed

Good news. My regexp was more resilient at separating HTML and shortcodes on a left-to-right basis than the old one. We can allow HTML within shortcodes. Shortcodes within shortcodes was the main problem. As azaozz mentioned on #15694, the [ and ] characters must be encoded as HTML entities any time they appear inside of any shortcode. This is almost never accomplished, which is why it's so tricky to separate the bugs.

In miqro-28564.patch:

  • Allow HTML inside of shortcode attributes for wptexturize.
  • Continue to ignore the [ char if it appears in any HTML attribute.
  • Allow the ] char to corrupt any HTML element that appears after a [ char, as in 3.9.
  • Update related patterns.
  • Update unit tests.

#10 @SergeyBiryukov
10 years ago

  • Keywords commit added

@miqrogroove
10 years ago

Adds one more unit test.

@miqrogroove
10 years ago

Allow shortcodes and HTML to alternate semi-recursively.

#11 @miqrogroove
10 years ago

In miqro-28564.3.patch:

  • Allow well-formed HTML inside of shortcode attributes for wptexturize.
  • Do not recurse any other part of wptexturize. HTML is allowed but ignored.
  • Do not allow exotic HTML comments in shortcode attributes.
  • Continue to ignore the [ and ] chars if they appear in any HTML attribute.
  • Update related patterns.
  • Update unit tests.

#12 @miqrogroove
10 years ago

  • Keywords wptexturize added

#13 @wonderboymusic
10 years ago

  • Owner set to wonderboymusic
  • Resolution set to fixed
  • Status changed from new to closed

In 28773:

In wptexturize() + tests:

  • Allow well-formed HTML inside of shortcode attributes
  • Restrict recursion. HTML is allowed but ignored.
  • Do not allow exotic HTML comments in shortcode attributes.
  • Continue to ignore the [ and ] chars if they appear in any HTML attribute.
  • Update related regex patterns.
  • Update unit tests.

Props miqrogroove.
Fixes #28564.

#14 @nacin
10 years ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

Per IRC (yesterday) — this absolutely needs a security review.

#15 in reply to: ↑ 1 ; follow-ups: @kitchin
10 years ago

The test case in Sergey Biryukov's comment:1 still fails.

Broken in [28727].

To reproduce:

  1. Register a shortcode:
    function footag_func( $atts ) {
         return "title = {$atts['title']}";
    }
    add_shortcode( 'expand', 'footag_func' );
    
  2. Create a post with this content: [expand title="<strong>Strong Test</strong>"][/expand].

Output:

  • Before [28727]: title = <strong>Strong Test</strong>.
  • After [28727]: title = &#187;<strong>Strong.

My updated results.
Output:

  • Wordpress 3.9.1: title = <strong>Strong Test</strong>.
  • Trunk after [28773]: title = &lt;strong&gt;Strong Test&lt;/strong&gt;.

I would also like to consider "well-formed HTML," mentioned in comment:11 and comment:13. The author of the popular jQuery slide show plugin Cycle has released a substantially different plugin called Cycle2. It moves most of the parameters from JS to HTML data attributes. This has brought grief to simple HTML parsers because it encourages the following type of HTML, as seen in the Cycle2 manual, http://jquery.malsup.com/cycle2/demo/non-image.php :

<div class="cycle-slideshow" data-cycle-slides="> div">

While I deplore it personally, everyone seems to agree it is proper HTML. (The data is indicating the CSS selector for a slide show.) I know of plugins at wordpress.org/extend using Cycle2, and at least one that could probably be coerced into executing a shortcode with that type of HTML. Anyway it's a change in behavior and could use a Unit test for whatever the expected result is, like comment:1.

Here's my test for Cycle2 inspired HTML:

...

  1. Create a post with this content: [expand title="div > blockquote"][/expand].

Output:

  • Wordpress 3.9.1: title = div > blockquote.
  • Trunk after [28773]: title = &#8221;div.

#16 in reply to: ↑ 15 @miqrogroove
10 years ago

Replying to kitchin:

My updated results.
Output:

  • Wordpress 3.9.1: title = <strong>Strong Test</strong>.
  • Trunk after [28773]: title = &lt;strong&gt;Strong Test&lt;/strong&gt;.

Could you double check that? The code involved with 28773 doesn't strip elements.

#17 @kitchin
10 years ago

My apologies, you are correct. The comment:1 test case is fixed. (I used the visual editor instead of the text editor.)

  • Wordpress 3.9.1: title = <strong>Strong Test</strong>.
  • Trunk after [28773]: title = <strong>Strong Test</strong>.

The deplorable HTML test case I wrote is still broken though if we want to allow that sort of thing.

  • Wordpress 3.9.1: title = div > blockquote.
  • Trunk after [28773]: title = &#8221;div.

#18 in reply to: ↑ 15 ; follow-up: @miqrogroove
10 years ago

Replying to kitchin:

<div class="cycle-slideshow" data-cycle-slides="> div">

wptexturize() has never supported this syntax. You're welcome to open a new ticket for that issue, but it has nothing to do with the bug reported above.

  1. Create a post with this content: [expand title="div > blockquote"][/expand].

I'm going to read between the lines here and assume something like this will happen:

[expand title='<div class="cycle-slideshow" data-cycle-slides="> div">'][/expand]

We could find a way to allow this by ignoring extra > chars inside of shortcode attributes, which are not texturized anyway. I don't think it's a good idea at all, but it could be done.

#19 @miqrogroove
10 years ago

Eventually someone will attempt this too:

[expand title='<div class="cycle-slideshow" data-cycle-slides="> div[]">'][/expand]

That's why we have to draw a line on shortcode usage, somewhere.

#20 in reply to: ↑ 18 ; follow-up: @kitchin
10 years ago

Replying to miqrogroove:

wptexturize() has never supported this syntax. You're welcome to open a new ticket for that issue, but it has nothing to do with the bug reported above.

I'm just reporting the change in behavior, I don't know how important it is. "Well-formed HTML" is tricky and I'm not sure can really be handled by lookaheads in regexes, as opposed to tokenizing and switch statements, but I'm probably over my head here.

#21 in reply to: ↑ 20 @miqrogroove
10 years ago

Replying to kitchin:

I'm just reporting the change in behavior, I don't know how important it is. "Well-formed HTML" is tricky and I'm not sure can really be handled by lookaheads in regexes, as opposed to tokenizing and switch statements, but I'm probably over my head here.

So the change in behavior is that HTML that was never valid in wptexturize is now also invalid inside of shortcodes? Can we write this off as undefined behavior? Or is someone actually using shortcodes to avoid wptexturize from handling their invalid HTML?

#22 follow-up: @kitchin
10 years ago

I'm not sure why we're texturizing shortcode attributes at all. The problem in comment:3 is solved by searching left to right for the next ], since we later agreed [ and ] must be escaped within a shortcode.

Real world example... I tested a plugin that uses jQuery Cycle2. It did break, though I doubt anyone is using it in the way that breaks it. But here goes. The shortcode for the Testimonial Rotator plugin is like this:
[testimonial_rotator id="99"]
There is an attribute "div_selector" with default value "> div.slide". So the shortcode above should be equivalent to:
[testimonial_rotator id="99" div_selector="> div.slide"]
In WP 3.9.1, the $content passed to do_shortcode() is exactly that.
In trunk, $content is:
[testimonial_rotator id=&#8221;99&#8243; div_selector=&#8221;> div.slide&#8221;]

Valid html... The jQuery Cycle2 plugin is pretty standard. Its examples use this type of html:

<!DOCTYPE html>
<html><head><meta charset="UTF-8"><title>test</title></head><body>
<div id="testimonial_rotator_99" data-cycletwo-slides="> div.slide"></div>
</body></html>

You can paste that into http://validator.w3.org/#validate_by_input and see it is valid. I don't like it, and I think that Cycle2 should allow an escaped value but it does not. If I recall correctly, the author told me, hey it's valid HTML, and did not understand sometimes people like to simple quick parsing.

What is the issue... Don't have a clear spec for what we're trying to do. Don't know how to address comment:14, Nacin's "needs a security review."

Last edited 10 years ago by kitchin (previous) (diff)

#23 @kitchin
10 years ago

Another use case, besides css selectors, is the comparison operator ">". Since the WP API functions allow it as a parameter when searching for posts (via the WP_Meta_Query parameter "meta_compare"), it is conceivable a plugin author would allow it in a search shortcode attribute. Perhaps the plugin could handle it escaped, but if we're escaping [, ], why escape <, > ?

#24 in reply to: ↑ 22 @miqrogroove
10 years ago

Replying to kitchin:

I'm not sure why we're texturizing shortcode attributes at all. The problem in comment:3 is solved by searching left to right for the next ], since we later agreed [ and ] must be escaped within a shortcode.

Um yeah, we didn't agree on that, because in practice that never actually happens. Even most internal uses of shortcodes won't escape attribute text.

There is an attribute "div_selector" with default value "> div.slide". So the shortcode above should be equivalent to:

Okay I got that, but I have to go back to my last point. You can't put that anywhere else in a post. Ever. It's a matter of chance that the shortcode parser even allows this.

#25 follow-up: @kitchin
10 years ago

Here's a different issue, not necessarily well-defined or well-covered by tests. The new regex is more greedy than it looks at first glance:

formatting.php line 206

.	'(?:'
.		'[^\[\]<>]'	// Shortcodes do not contain other shortcodes.
.	'|'
.		'<.+?>' // HTML elements permitted. Prevents matching ] before >.
.	')+'

and the corresponding line 232:

} elseif ( '[' === $first && 1 === preg_match( '/^\[\[?(?:[^\[\]<>]|<.+?>)+\]\]?$/', $curl ) ) {
	// This is an escaped shortcode delimeter.

	// Do not texturize.

The intent is surely to match brackets only within html tags. However it also matches brackets when any html tag is present. That's becuase <.+?> will go back and try to match as if <.+>.

So all the following tests pass in trunk, including the unintended third one:

array(
	'[ do texturize "[quote]" here ]',
	'[ do texturize &#8220;[quote]&#8221; here ]',
),
array(
	'[ regex catches this <a href="[quote]">here</a> ]',
	'[ regex catches this <a href="[quote]">here</a> ]',
),
			
array(
	'[ but also catches the <b>styled "[quote]" here</b> ]',
	'[ but also catches the <b>styled "[quote]" here</b> ]',
),

You can drop them into tests/phpunit/tests/formatting/WPTexturize.php line 1609 to see, though I tested without the full harness.

(Quick edit: in my opinion the first test should fail, but I've already argued that.)

Last edited 10 years ago by kitchin (previous) (diff)

@miqrogroove
10 years ago

Remove lazy quantifiers. Add more unit tests.

#26 in reply to: ↑ 25 @miqrogroove
10 years ago

Replying to kitchin:

The new regex is more greedy than it looks at first glance

Patch attached for that. Easy tweak.

#27 @kitchin
10 years ago

It's worth summarizing the big picture, I hope accurately.

We parse for shortcodes several times.

  1. The first parse is when texturizing. To fix longstanding bugs, [28727] improved finding shortcodes within the context of the surrounding html.
  1. The second parse is to expand the shortcodes. The regex pattern here has not changed. It is somewhat different than the regex in step 1, but sane text should work in both.

The pattern in step 1 is aware of <, >, but unaware of /, and unaware of registered shortcode names. The pattern in step 2 has the opposite qualities.

The step 2 pattern is actually used twice, first to unwrap <p>...</p> around standalone shortcodes.

This ticket was mentioned in IRC in #wordpress-dev by wonderboymusic. View the logs.


10 years ago

#29 @wonderboymusic
10 years ago

In 28852:

Use less greedy regex in wptexturize(). Adds unit tests.

Props miqrogroove.
See #28564.

#30 @wonderboymusic
10 years ago

Failing tests:

1) Tests_Formatting_WPTexturize::test_translate with data set #36 ('[ do texturize "[quote]" here ]', '[ do texturize &#8220;[quote]&#8221; here ]')
Failed asserting that two strings are equal.
--- Expected
+++ Actual
@@ @@
-'[ do texturize &#8220;[quote]&#8221; here ]'
+'[ do texturize !openq2![quote]!closeq2! here ]'

/Users/205410/Sites/wordpress-core-develop/tests/phpunit/tests/formatting/WPTexturize.php:1430

2) Tests_Formatting_WPTexturize::test_translate with data set #38 ('[ but also catches the <b>styled "[quote]" here</b> ]', '[ but also catches the <b>styled &#8220;[quote]&#8221; here</b> ]')
Failed asserting that two strings are equal.
--- Expected
+++ Actual
@@ @@
-'[ but also catches the <b>styled &#8220;[quote]&#8221; here</b> ]'
+'[ but also catches the <b>styled !openq2![quote]!closeq2! here</b> ]'

/Users/205410/Sites/wordpress-core-develop/tests/phpunit/tests/formatting/WPTexturize.php:1430

3) Tests_Formatting_WPTexturize::test_translate with data set #39 ('[Let\'s get crazy<input>[plugin code="<a href=\'?a[]=100\'>hello</a>"]</input>world]', '[Let&#8217;s get crazy<input>[plugin code="<a href=\'?a[]=100\'>hello</a>"]</input>world]')
Failed asserting that two strings are equal.
--- Expected
+++ Actual
@@ @@
-'[Let&#8217;s get crazy<input>[plugin code="<a href='?a[]=100'>hello</a>"]</input>world]'
+'[Let!apos!s get crazy<input>[plugin code="<a href='?a[]=100'>hello</a>"]</input>world]'

/Users/205410/Sites/wordpress-core-develop/tests/phpunit/tests/formatting/WPTexturize.php:1430

#31 @miqrogroove
10 years ago

Some kind of unit bug? ugh

@miqrogroove
10 years ago

Fix copypasta errors in unit tests.

#32 @SergeyBiryukov
10 years ago

In 28860:

Use correct data providers.

props miqrogroove.
see #28564.

#33 @wonderboymusic
10 years ago

  • Resolution set to fixed
  • Status changed from reopened to closed

Re-open if anything breaks, but this appears done for 3 weeks now.

#34 @miqrogroove
10 years ago

I added a ticket for extra discussion at #29661. Aside from bugs, we have some underlying problems with documentation and tests for shortcodes.

Note: See TracTickets for help on using tickets.