Make WordPress Core

Opened 14 years ago

Closed 9 years ago

#14481 closed enhancement (maybelater)

Shortcode Enhancements

Reported by: deadowl's profile deadowl Owned by:
Milestone: Priority: normal
Severity: normal Version: 3.0
Component: Shortcodes Keywords: needs-unit-tests
Focuses: Cc:

Description (last modified by nacin)

Somewhat of a copy of a post to wp-hackers: I wrote my own implementation of shortcodes. It does things a bit differently, has nested evaluation, and allows self-nesting. Since there are some significant differences to the existing implementation,

A lot of the changes are borrowed from definitions in the HTML specification (particularly name and attribute matching). The following are the comments at the top of my shortcode file. I tried to keep track of all the differences (and questions) there.

From Test Cases

(http://svn.automattic.com/wordpress-tests/wp-testcase/test_shortcode.php)

Shortcode Statuses (to implement, or not to implement?)
        enabled = the shortcode works as normal (default)
        strip = the shortcode will be parsed and removed.  e.g.
'[shortcode foo="bar"]' produces ''.  '[shortcode]foo[/shortcode]'
produces 'foo'.
        faux = the shortcode will be abbreviated.  e.g. '[shortcode
foo="bar"]' products '[shortcode]'.  '[shortocde]foo[/shortcode]'
produces '[shortcode]'
        disabled = the shortcode is not parsed at all.  e.g.
'[shortcode foo="bar"]' products '[shortcode foo="bar"]'

Major Differences/Improvements

  1. Addressing http://codex.wordpress.org/Shortcode_API#Limitations
  1. You can nest any tag at any depth regardless of ancestors (#10702)
  1. Tag and attribute names may match: /[A-Za-z][-A-Za-z0-9_:.]*//* (trialing /* because that comment ends), with case-insensitive interpretation
  1. Interpretation doesn't get tripped up by things like hyphens

II. Addressing Ticket #12760,

  1. Changed from fix in #6518. Reasoning: balancing double-square-brackets can have misleading results with nesting
  1. Shortcodes escapable by using [[, ]]
  1. ]] is escaped "right to left", so [shortcode]]] would evaluate to result]
  1. '' is escaped left to right `[[[shortcode?] => [result]`

III. Enhancements

  1. Only matches valid shortcode for registered hooks, everything else will appear as text
  1. Can register multiple hooks to single shortcode, uses priority (default: 10)

IV. Conflicting Design Changes

  1. Quoted literals are escaped by entities rather than cslashes
  1. Inline (self-closing) shortcodes get passed content to accomodate multiple callbacks
  1. No equivalent to shortcode_parse_atts function (Not marked private in function reference, but not documented in shortcode API page)
  1. Boolean attributes take the place of positional attributes [foo bar] gets attributes array('bar' => 'bar') instead of array('0' => 'bar')
  1. Disallows attribute and tag names that don't match /[A-Za-z][-A-Za-z0-9_:.]*/
  1. Disallows unquoted attribute values (also boolean attributes), unless they match /[-A-Za-z0-9_:.]+/

Basic Interpretation Method

  1. If an open tag is encountered, it is added to the stack
  1. If a close tag is encountered and there is no matching open tag on the stack the close tag is ignored
  1. If a close tag is encountered and there is a matching open tag on the stack all opened tags on the stack before the matched tag will be implicitly self-closed
  1. If text or an inline tag is encountered, it will be evaluated to its parent's content immediately
  1. If tags are not closed by the end of the interpretation, they will be implicitly self-closed

Issues

  1. Haven't written new unit tests to reflect new functionality added, functionality differences
  1. Documentation is not as good (though I hope most of the code is self-explanatory)
  1. Not 100% backwards compatible

Attachments (3)

shortcode.patch (28.4 KB) - added by deadowl 14 years ago.
shortcode patch
tagcodes.php (12.4 KB) - added by deadowl 14 years ago.
Took the suggestion of using filters as an interface, fixed some parsing bugs, put a leading underscore in front of internal functions, changed function names to use "tagcode" instead of "shortcode"
content_tags.php (14.0 KB) - added by deadowl 14 years ago.
Refactored previous content_tags.php, in particular the eval-geared functions. Also restructured the 'node' structure used for parsing and interpreting

Download all attachments as: .zip

Change History (34)

@deadowl
14 years ago

shortcode patch

#1 follow-up: @nacin
14 years ago

  • Description modified (diff)

#2 in reply to: ↑ 1 @deadowl
14 years ago

Replying to nacin:
Thank you, I was using markdown and couldn't figure out how to go back in to edit that.

#3 @Denis-de-Bernardy
14 years ago

  • Keywords has-patch added
  • Version set to 3.0

#4 @deadowl
14 years ago

  • Cc deadowl added

Doesn't have a problem with example-post.txt from ticket either, though I think there might be some compatibility issues with long-post.zip.

#5 @deadowl
14 years ago

Ticket 8553* (I believe this would resolve a lot of backtracking issues)

#6 follow-ups: @scribu
14 years ago

Just took a quick look at your patch. Some nitpicking:

You should use call_user_func() instead of $callback().

The patch should not contain all the notes about differences/improvements.

What exactly are the backwards incompatible changes?

#7 in reply to: ↑ 6 @deadowl
14 years ago

Replying to scribu:

Just took a quick look at your patch. Some nitpicking:

Ready.

You should use call_user_func() instead of $callback().

Okay, I'll switch to using call_user_func.
There was also a comment bug, and I kind of messed up shortcode_atts

The patch should not contain all the notes about differences/improvements.

Okay.

What exactly are the backwards incompatible changes?

  1. Uses entities/character references for escaping rather than C slashes.
  2. Standalone attributes (ex. [sc hello]) would evaluate to hello => hello rather than 0 => hello. Additionally, hello is treated as a name rather than a value.
  3. Escaping a shortcode isn't a nested evaluation. An opening [[ does not look for a closing ]] on a matching tag. I consider this an improvement that avoids backtracking.
  4. Testing for $content == null no longer differentiates a shortcode as inline. This is to accommodate the ability to add multiple callbacks to a shortcode.
  5. No shortcode_parse_atts() function (which isn't mentioned in Shortcode API doc, but is in function reference).
  6. Limits attribute and tag names to matching /[A-Za-z][-A-Za-z0-9_:.]*/
  7. Limits unquoted attribute values to /[-A-Za-z0-9_:.]+/

#8 follow-ups: @scribu
14 years ago

Ok, I don't think backwards compatibility will be a problem.

Can register multiple hooks to single shortcode, uses priority (default: 10)

Why do you need that? Please provide a usage example.

#9 in reply to: ↑ 6 @deadowl
14 years ago

Replying to scribu:

You should use call_user_func() instead of $callback().

call_user_func() won't pass by reference the stack and refs from shortcode_eval(). Because of this, I have modified the code to statically call the tag_inline, tag_open, tag_close, text, esc_lsqbr, esc_rsqbr evaluation methods because they all need a reference to at least the stack. Meanwhile, it appears I have broken something with building the parse tree, so I will be fixing that before I post a new diff.

#10 @mikeschinkel
14 years ago

  • Cc mikeschinkel@… added

#11 in reply to: ↑ 8 ; follow-up: @aaroncampbell
14 years ago

Replying to scribu:

Ok, I don't think backwards compatibility will be a problem.

Can register multiple hooks to single shortcode, uses priority (default: 10)

Why do you need that? Please provide a usage example.

I haven't tested the proposed patch yet, but here's a thought: Say there was a popular plugin that uses an address shortcode and drops in a Google Map. Another plugin could be made that uses that same shortcode to drop in the proper micro-formatting for an address.

It really will only help if plugin developers attempt to standardize certain shortcodes. However, all the cases I can think of could also be managed pretty easily of the original plugin author added the necessary hooks to their plugin...but we all know that doesn't happen.

#12 @aaroncampbell
14 years ago

  • Cc aaron@… added

#13 in reply to: ↑ 11 @deadowl
14 years ago

Replying to aaroncampbell:

I haven't tested the proposed patch yet, but here's a thought: Say there was a popular plugin that uses an address shortcode and drops in a Google Map. Another plugin could be made that uses that same shortcode to drop in the proper micro-formatting for an address.

Yea, the proposed patch is accidentally broken-ish atm. A syntax error at a comment, misimplemented shortcode_atts, and an incorrect parse tree.

It really will only help if plugin developers attempt to standardize certain shortcodes. However, all the cases I can think of could also be managed pretty easily of the original plugin author added the necessary hooks to their plugin...but we all know that doesn't happen.

I had started to work on some compat expressions that could be parsed and then throw a deprecation warning when evaluated. I don't know if the same would be possible for accessing positional attributes.

#14 in reply to: ↑ 8 @deadowl
14 years ago

Replying to scribu:

Can register multiple hooks to single shortcode, uses priority (default: 10)

Why do you need that? Please provide a usage example.

I am going to distribute a plugin that provides a set of shortcodes. There will be different variations in the needs of the users who use my plugin. Instead of maintaining one monolithic plugin that meets all of my users needs, I can provide a basic framework that the users can extend to meet their needs. They can further distribute their own extensions that apply to existing posts without requiring edits to a large backlog of posts to implement in full.

See http://en.wikipedia.org/wiki/Extensibility

Additionally, the add/remove/has/etc functions are consistent with the filter and action core functions of Wordpress.

#15 follow-up: @scribu
14 years ago

Ok, so why not use do_action() and add_action() internally?

#16 in reply to: ↑ 15 @aaroncampbell
14 years ago

Replying to scribu:

Ok, so why not use do_action() and add_action() internally?

I think I agree with Scribu (assuming I'm understanding what he's saying). It would be nice to be able to use filters, but it would require that the parameters be changed so the content parameter comes first...which REALLY messes with backwards compatibility. The only way I see to be able to rearrange parameters would be to create new functions (like wp_add_shortcode, or something similar) and use a function with the old functions that rearranges the parameters before calling their handler. It would be messy, but the end result would be pretty nice:

Basically, when someone did add_shortcode('myshortcode', 'myhandler', 10); internally it would do something like add_filter('shortcode-myshortcode', 'myhandler', 10, 3); and then when it processed shortcodes it would simply use apply_filters('shortcode-myshortcode', $content, $attributes, $tag);

You would obviously also be able to use has_filter, remove_filter, and even remove_all_filters.

#17 @scribu
14 years ago

Yeah, that's what I meant. :)

#18 follow-ups: @aaroncampbell
14 years ago

I'm assuming we'd still need to keep a stack of registered shortcodes so we can use those names in the regex? I've not a regular expressions guru by any means (I'm fine at writing them to meet my needs, but it ends there), so how advantageous is it to create the regex using the list of registered shortcode names? Also, at what point (if any) does it become a problem (what if there are 1,000 shortcodes? 10,000?)? I know that the problems it may create are probably edge cases, but I'm curious.

#19 in reply to: ↑ 18 @deadowl
14 years ago

Replying to aaroncampbell:

I'm assuming we'd still need to keep a stack of registered shortcodes so we can use those names in the regex? I've not a regular expressions guru by any means (I'm fine at writing them to meet my needs, but it ends there), so how advantageous is it to create the regex using the list of registered shortcode names? Also, at what point (if any) does it become a problem (what if there are 1,000 shortcodes? 10,000?)? I know that the problems it may create are probably edge cases, but I'm curious.

The regular expression doesn't have to use the registered shortcode names if the interpreter does the checking. The interpreter can use an associative array, so it would generally be faster for the interpreter to check. Using the shortcode in the regex probably takes about the same amount of time as an in_array call as it's interpreted.

Keeping track of valid shortcodes would still be needed in interpretation so something like [not-a-shortcode] would remain content and not be thrown out.

#20 @scribu
14 years ago

Related: #14557

@deadowl
14 years ago

Took the suggestion of using filters as an interface, fixed some parsing bugs, put a leading underscore in front of internal functions, changed function names to use "tagcode" instead of "shortcode"

#21 in reply to: ↑ 18 @deadowl
14 years ago

Replying to aaroncampbell:

I'm assuming we'd still need to keep a stack of registered shortcodes so we can use those names in the regex? I've not a regular expressions guru by any means (I'm fine at writing them to meet my needs, but it ends there), so how advantageous is it to create the regex using the list of registered shortcode names? Also, at what point (if any) does it become a problem (what if there are 1,000 shortcodes? 10,000?)? I know that the problems it may create are probably edge cases, but I'm curious.

I factored out most added look-around assertions (since your original response), which results in one (not worrisome) change in intended behavior. For the expression "[tag]]", the shortcode will be evaluated followed by a ]. The expression "]]" is now evaluated in a left to right manner, with tag expressions taking precedence. Trying to do assertions that a tag ends followed by an even number of right-square-brackets was hurting. I can't think of any instances in which extensive backtracking may occur now except for the assertion of the tagname being registered.

With enough elbow grease, that can probably get factored out without implying any change in behavior. The problem would be preserving the original input that created the tag while still separating the name from the attribute list. I think this is more of a goal than tracking whether a tag name is valid when it's hooked to remove only the look-around.

I had originally changed name from shortcode to tagcode because I want to call callbacks in a way that's more fluid with apply_filter, and that would just about destroy backwards compatibility. I changed to content_tag because it's a lot more intuitive and descriptive.

#22 @deadowl
14 years ago

I updated the content_tags.php file to a version that factors out the global variable, as well as the last remaining look-around assertion.

The method employed will prevent certain patterns from being parsed, but they wouldn't likely be widespread. What happens is that it will parse something that looks like a valid shortcode, ignoring the name, and then if the name doesn't have a hook, it will just output the original expression.

"so if [this content tag] is invalid" would then translate to "so if is [this content tag] is invalid" given the tag "this" not having a hook.

The implementation prevents certain syntax variations.

ex 1.
[invalid-tag hello="[valid-tag/]"], the inline-expression, [valid-tag/], is not interpreted

ex 2.
[invalid-tag]], the escaped right-square-bracket is not interpreted

Another option would be to swallow tags that don't have a callback. My main concern here is keeping the syntax intuitive. Any feedback?

@deadowl
14 years ago

Refactored previous content_tags.php, in particular the eval-geared functions. Also restructured the 'node' structure used for parsing and interpreting

#23 @deadowl
14 years ago

I realized that in the future, I might want to enable nesting to be context-sensitive. The current callback features don't scale to this idea.

e.g. {{{
[faculty id="jmaloi"]

<dl>

<dt>Name</dt>
<dd>[faculty-name]</dd>
<dt>Phone</dt>
<dd>[faculty-phone]</dd>

</dl>

faculty
}}}

I wonder if it would really have to go that far, though, because then you might as well want to make a template for the content type.

#24 @scribu
14 years ago

Indeed it would be overkill. We don't want to create an entire templating language.

#25 @daveagp
13 years ago

  • Cc daveagp added

I encounter a bug when using closing square brackets in quoted attribute values for shortcodes. This place is the closest-to-appropriate place I could find to report it, I would appreciate any redirection if somewhere else is better. It happened in wordpress 3.0.1 and does not seem to be explicitly addressed above.

The bug: when a page source contains

[mytag myatt="red]green"]contentmytag

the $content erroneously is processed as

green"]content

Notes:

  • opening brackets [ don't cause this problem
  • changing the double quotes to single quotes does not help

It would be great if the proposed fix handles this case - I have not tried to figure out how to use the patch and so I have not checked.

Last edited 10 years ago by daveagp (previous) (diff)

#26 @scribu
13 years ago

  • Keywords needs-unit-tests added

#27 @scribu
13 years ago

  • Milestone changed from Awaiting Review to Future Release

Too late for 3.1.

#28 @scribu
13 years ago

I remembered this ticket because I've encountered another bug:

WP doesn't like self closing shortcodes mixed with ones that have content:

[tag /] some words in between [tag]content[/tag]

The first [tag] will match /] some words in between [tag]content.

#29 @helen
10 years ago

So... this sounds super cool, but kind of a monster ticket that isn't easily picked back up if somebody wants to get into shortcodes (ha, ha). Worth splitting up / opening new ones? Picking one poison for this ticket? Or is it all tightly stuck together?

#30 @daveagp
10 years ago

A long time passed since my earlier comment here, so I can tell you what was my main problem personally, and how I worked around it.

Because I was making a website to teach programming and using shortcodes to create exercises, it was very important for me that shortcode argument values (at least quoted ones) would be able to contain any string, not just a limited set of characters. (I didn't care about shortcode argument names.)

Problem 1. WordPress will not allow any of the shortcode arg values to contain
a closing square bracket ] under any circumstances, even if it is quoted.

Problem 2. Each shortcode arg value may either be (a) unquoted but contain no spaces,
(b) single-quoted but contain no single quotes, or (c) double-quoted
but contain no double quotes. This restriction means you can't pass both kinds at once. It's a weird restriction because the values are later run through
stripcslashes anyway.

To get around it, I wrote some modified version of the shortcode infrastructure that would recognize backslashes as escapes and wait for a closing quote rather than stopping when it hit a ]. Once I figured out that this was what I wanted to do, the rest was pretty straightforward:

https://github.com/cemc/cscircles-wp-content/blob/master/plugins/pybox/sweetcodes.php

Sorry that this file is part of a monolithic plugin but if it would help for me to explain or extract the part of it that's relevant (mostly the giant regex) let me know.

If it is likely that these character classes could become kosher for quoted shortcode arguments, I would be happy to do some work to incorporate it!

Many of the other improvements sure seem like good ideas, like being able to self-nest the same shortcode, even though it's never affected me directly.

#31 @miqrogroove
9 years ago

  • Keywords has-patch removed
  • Milestone Future Release deleted
  • Resolution set to maybelater
  • Status changed from new to closed

No traction.

Note: See TracTickets for help on using tickets.