Make WordPress Core

Opened 14 years ago

Closed 8 years ago

#10702 closed enhancement (wontfix)

Support for complex nested shorttags

Reported by: pepijndevos's profile pepijndevos Owned by:
Milestone: Priority: normal
Severity: normal Version: 2.9
Component: Shortcodes Keywords: needs-testing needs-refresh close
Focuses: Cc:

Description

The documentations is misleading on this point.
It is said that you can have nested shorttags by calling do_shorttag on the content.

However the shorttag parser isn't really a parser, it's just a non-greedy regex of some sort. It does work for simple tags without content, but when we define a [div] tag, look at the following code:

[div]
    [div]
        test1
    [/div]
    [div]
        test2
    [/div]
[/div]

Results is:

<div>
    <div>(tag without content)
    test1
</div>
<div>
    test2
    [/div](not parsed at all)
</div>

I think that if you want to stop people from writing their own parsers you should include some decent parser with Wordpress, not a regex I could do myself.

Workaround: Assign div2 and probably div3 and 4 to the same function, and use these for the nested tags. <--- very ugly!

Attachments (3)

10702.diff (3.2 KB) - added by kovshenin 11 years ago.
10702.unit-tests.diff (1.9 KB) - added by kovshenin 11 years ago.
10702.performance.csv (1.1 KB) - added by kovshenin 11 years ago.

Download all attachments as: .zip

Change History (22)

#1 @scribu
14 years ago

  • Component changed from Editor to Shortcodes
  • Keywords needs-patch added
  • Milestone changed from Unassigned to Future Release
  • Version set to 2.9

#2 @scribu
14 years ago

Well, if you're so dissatisfied, why don't you contribute with a patch?

#3 @pepijndevos
14 years ago

Because I can't.

I've never written a complex non-regex parser(and have no idea how to do it*) and never contributed a patch to Wordpress.

It'd be an interesting learning experience, but I doubt if it will work, be Wordpress coding style, backwards compatible, etc...
Looking at shortcode.php I'm not sure which parts are mandatory an what I can throw out to write something new(with all those comments).

  • If I had to write a parser I think I would do it with a lot of foreach loops starting a new loop for every new tag and ending a loop for every end tag(to create some sort of 'levels'). I don't know exactly how, else I would have written it.

#4 @scribu
14 years ago

Perhaps the code for bbCode Lite plugin for bbPress will be of help.

Shortcodes and bbCode are very similar.

#5 @pepijndevos
14 years ago

Maybe...

They stated in shortcode.php that it's based on textpatern:

<?php
/**
 * WordPress API for creating bbcode like tags or what WordPress calls
 * "shortcodes." The tag and attribute parsing or regular expression code is
 * based on the Textpattern tag parser.

On the Textpattern site I found this article saying that before version X they also did not support nesting of the same tags, but now they do. Maybe that code is usable?

#6 follow-up: @pepijndevos
14 years ago

Forgot to say:
url: http://textpattern.com/weblog/318/tag-parser-part-1

No, BBCode Lite is not usable, it's based on regex also.

#7 in reply to: ↑ 6 @scribu
14 years ago

Replying to pepijndevos:

No, BBCode Lite is not usable, it's based on regex also.

So what if it's based on regex? It gets the job done: nested tags.

#8 @pepijndevos
14 years ago

No, it has the same problem as Wordpress has.
A regex either looks for the widest match or the smallest, looking at the code above you can see that both cause trouble.
Explanation: http://kore-nordmann.de/blog/do_NOT_parse_using_regexp.html

You somehow need to match tags on the same nesting level. But as I said, I never wrote a parser so I don't know how.

#9 @sirzooro
14 years ago

Hints:

  • you can use regular expressions to split text into tokens;
  • use stack to store open tags (or write everything using recursion - IMO more tricky way).

#10 @pepijndevos
14 years ago

Ok, I think I understand now, but there ar still some blurry parts in my mind on the point where I need to put the content in the matched tags. But I think I'll get around that...

Another problem is that plugins now call do_shortcode recursively, and that I don't know which functions are private or can be changed. This means I can probably write the parser by now, but I have no clue yet how I can get it to fit the current situation.

Maybe it is easier to leave do_shortcode alone and make a new do_recursive_shortcode... or something like that?
It would not brake anyones code and leave the faster alternative for simple tags.

#11 @scribu
14 years ago

Or you could add a 'recursive' parameter to do_shortcode and call _do_shortcode() and _do_recursive_shortcode() internally.

#12 @Viper007Bond
11 years ago

Still a valid bug even with the new shortcode regex although the output has changed.

add_shortcode( 'div', 'shortcode_div' );
function shortcode_div( $atts, $content ) {
	return '<div>' . do_shortcode( $content ) . '</div>';
}
[div][div]test1[/div][div]test2[/div][/div]
<div><div></div>test1</div><div>test2</div>[/div]

The first [/div] found gets matched with the first [div] found, resulting in [div]test1 being the $content and then that stand-alone shortcode getting expanded.

@kovshenin
11 years ago

#13 @kovshenin
11 years ago

  • Keywords has-patch needs-testing added; needs-patch removed

Hi there! My name is Konstantin and I love regular expressions (not!). Everybody says we should not parse HTML using a regular expression, and they're right. However, nobody said anything about using two (or more) regular expressions to parse HTML, so let's just do a bunch of loops :)

10702.diff is an attempt to allow proper nested shortcodes in WordPress. The regex pattern has changed to use a positive lookahead and match the opening shortcode tag only if it's followed by its closing tag, meaning the search will match the inner tags first, which is why a loop has been introduced in do_shortcode and strip_shortcodes which will loop from the inner tags to the outer tags. This means that preg_replace_callback will run at least twice if a shortcode is present.

10702.unit-tests.diff adds some test coverage.

10702.performance.csv shows that there's a slight performance hit when the content has a lot of shortcodes. The "no shortcodes" and "some shortcodes" gains are probably due to the strstr( $content, '[' ) short circuit in do_shortcode. The plugin that ran these tests can be found here.

This is a draft and has room for improvement. Whatever we end up doing in do_shortcode should be equally done in strip_shortcodes. Also, escaping shortcodes might look a little weird with that string replacement, but since we're now replacing in a loop several times, replacing [[ with [ will no longer work. If you can think of a smarter way, I'm all ears!

Your feedback is highly appreciated :)

#14 follow-up: @Viper007Bond
11 years ago

FYI:

If you only want to determine if a particular needle occurs within haystack, use the faster and less memory intensive function strpos() instead.

From http://php.net/strstr

I can't comment on the actual patch though -- regex is scary for me. ;)

#15 in reply to: ↑ 14 @kovshenin
11 years ago

Replying to Viper007Bond:

Cool, thanks for the heads up! Will address this in the next patch!

#16 follow-up: @aaroncampbell
9 years ago

  • Keywords needs-refresh added; has-patch removed

Looking at this, it seems that the thing making it more difficult is allowing the self-closing tags to be identical to an opening tag. It means that our simpler (certainly not simple) regex isn't pairing the right open/close tags. The current behavior is as follows:

[test num="1" expected="matches #6" actual="matches #3"]
  [test num="2" expected="matches #3" actual="seen as self-closing"]test1[/test num="3" expected="matches #2" actual="matches #1"]
  [test num="4" expected="matches #5" actual="matches #5"]test2[/test num="5" expected="matches #4" actual="matches #4"]
[/test num="6" expected="matches #1" actual="ignored completely"]

After the patch, tags 1 & 6 from above were completely missed, but the internal pairs seemed to match correctly. Having said that, the patch is also 2 years old and I had to apply the changes manually.

#17 in reply to: ↑ 16 @kovshenin
9 years ago

Replying to aaroncampbell: Yeah, I don't think I like my patch anymore.

#18 @chriscct7
8 years ago

  • Keywords close added

#19 @miqrogroove
8 years ago

  • Milestone Future Release deleted
  • Resolution set to wontfix
  • Status changed from new to closed

At the 9 September meeting, participants agreed to close this ticket because fixing it would require unwanted changes to the shortcode syntax, or to parser performance.

For details, please see https://make.wordpress.org/core/2015/09/29/shortcode-roadmap-draft-two/

Note: See TracTickets for help on using tickets.