WordPress.org

Make WordPress Core

Opened 4 years ago

Last modified 4 months ago

#33134 new defect (bug)

Complex Nested Shortcodes Inside of Attributes Are Not Processed Left-to-Right

Reported by: miqrogroove Owned by:
Milestone: Priority: normal
Severity: normal Version: 4.2.3
Component: Shortcodes Keywords:
Focuses: Cc:
PR Number:

Description (last modified by miqrogroove)

Given the following input pattern, the 4.2.3 update alters the flow of control very slightly, which could give the appearance that shortcodes are "not working".

[myloop]<html attr="[dependent]">[/myloop]

The preferred input pattern is the one in which the shortcodes output their own HTML, and this is unchanged by the update to 4.2.3:

[myloop][dependent][/myloop]

Additional considerations:

The Shortcode API does not process nested shortcodes. This only happens when plugins run do_shortcode() recursively.

Core devs have not yet considered whether this is a bug, nor to which Milestone it might be assigned.

Algorithm required for do_shortcode to partially restore left-to-right processing:

  1. Parse HTML.
  2. Replace all shortcodes found in attributes with unique, non-shortcode, non-HTML placeholders.
  3. Implode HTML.
  4. Run shortcode regexp.
  5. Before calling each custom handler, the placeholders would have to be undone inside of each wrapping shortcode.
  6. Call custom handler. At this point, the custom handler is able to get left-to-right recursion.
  7. Escape all shortcodes and placeholders in shortcode output (callback abstraction needed?)
  8. Undo all remaining placeholders.
  9. Parse HTML (again, from the top)
  10. Do shortcodes in attributes. (Not inside wrapping shortcodes).
  11. KSES filters the shortcode output.
  12. Implode HTML.

Attachments (3)

33134_shortcodes.diff (14.7 KB) - added by gitlost 4 years ago.
Demonstration of the shortcode placeholder pre-processor.
33134_test.diff (1.1 KB) - added by gitlost 4 years ago.
Unit test.
33134.patch (2.9 KB) - added by gitlost 3 years ago.
Pre-check for unfiltered_htmlness.

Download all attachments as: .zip

Change History (11)

#1 @miqrogroove
4 years ago

  • Description modified (diff)

#2 @minderdl
4 years ago

Since I don't know about the security hole you fixed with changeset [33359] (if someone can enlighten me send me a PM) I don't know if the following would re-introduce the hole:

It would be possible to distinguish between a non-closing shortcode, a self-closing shortcode and a wrapping (or nested) shortcode. All parts are already there in the regex returned by get_shortcode_regex().

So, IMO it would no be a problem finding the wrapping/nested shortcodes first and handling them, and in a second step the other shortcodes with all the checks you are doing now (i.e. quotation nesting).

I agree that something like this would be possible then:

<sometag attr="[shortcode]">
...<someothertags>...
[/shortcode]

which probably returns a completely wrong output - but as long as it is not a security issue we should not prevent all stupidity of users :-)

#3 follow-up: @miqrogroove
4 years ago

#33275 was marked as a duplicate.

#4 in reply to: ↑ 3 @chrisl27
4 years ago

Replying to miqrogroove:

#33275 was marked as a duplicate.

Adding here for followups that #33275 includes code to demonstrate that attribute and non-attribute shortcodes run differently in 4.2.3 and why this matters to shortcodes that change output depending on state.

@gitlost
4 years ago

Demonstration of the shortcode placeholder pre-processor.

@gitlost
4 years ago

Unit test.

#5 @gitlost
4 years ago

The attached is a prototype of the pre-processor filter for numbered placeholders that I mentioned on the Shortcode Roadmap Extended Discussion blog (https://make.wordpress.org/core/2015/09/04/shortcode-roadmap-extended-discussion/#comment-27653). It's intended as a demonstration really, not as a bug fix, though it does address this issue.

#6 @miqrogroove
4 years ago

The new function wp_placeholder_shortcodes() doesn't seem to do much differently from what we already have. It's calling get_shortcode_regex() and then replacing the shortcodes. So, this is much closer to the old parser of 4.1, with most of the same problems, compared to the algorithm I've outlined (which has its own problems too). Tomorrow at the meeting we will discuss the future of HTML vs. shortcodes. I'm not convinced it will be worth fixing this issue in the meantime, but I'm interested in other opinions.

Last edited 4 years ago by miqrogroove (previous) (diff)

#7 @senica@…
3 years ago

There was a request for followups on this, so I just wanted to add my issue.

Several tickets that have been marked as duplicates express my frustration with the shortcode behavior as currently is.

A shortcode's inner content should not be parsed for shortcodes by an outer shortcode. Currently, shortcodes in html tag's are parsed by the outer shortcode, which gives really conflicting results.

See here for example:
http://stackoverflow.com/questions/40390223/wordpress-outer-shortcode-parsing-html-tag-of-inner-shortcodes-content#
Halfway there is some reproducible code.

This can't be the expected result, can it?

I'm using 4.6.1 btw.

Last edited 3 years ago by senica@… (previous) (diff)

@gitlost
3 years ago

Pre-check for unfiltered_htmlness.

#8 @gitlost
3 years ago

Thought I might as well update and post this patch (which dates from the time over a year ago but wasn't posted for various reasons) that ameliorates the out-of-band behaviour by pre-checking for unfiltered_html indicators, only processing the shortcodes out of sequence if none found, and as a workaround allows one to signal one's unfilteredness by using dummy attributes if necessary.

Note: See TracTickets for help on using tickets.