Opened 14 years ago
Closed 8 years ago
#10702 closed enhancement (wontfix)
Support for complex nested shorttags
Reported by: |
|
Owned by: | |
---|---|---|---|
Milestone: | Priority: | normal | |
Severity: | normal | Version: | 2.9 |
Component: | Shortcodes | Keywords: | needs-testing needs-refresh close |
Focuses: | Cc: |
Description
The documentations is misleading on this point.
It is said that you can have nested shorttags by calling do_shorttag on the content.
However the shorttag parser isn't really a parser, it's just a non-greedy regex of some sort. It does work for simple tags without content, but when we define a [div] tag, look at the following code:
[div] [div] test1 [/div] [div] test2 [/div] [/div]
Results is:
<div> <div>(tag without content) test1 </div> <div> test2 [/div](not parsed at all) </div>
I think that if you want to stop people from writing their own parsers you should include some decent parser with Wordpress, not a regex I could do myself.
Workaround: Assign div2 and probably div3 and 4 to the same function, and use these for the nested tags. <--- very ugly!
Attachments (3)
Change History (22)
#1
@
14 years ago
- Component changed from Editor to Shortcodes
- Keywords needs-patch added
- Milestone changed from Unassigned to Future Release
- Version set to 2.9
#3
@
14 years ago
Because I can't.
I've never written a complex non-regex parser(and have no idea how to do it*) and never contributed a patch to Wordpress.
It'd be an interesting learning experience, but I doubt if it will work, be Wordpress coding style, backwards compatible, etc...
Looking at shortcode.php I'm not sure which parts are mandatory an what I can throw out to write something new(with all those comments).
- If I had to write a parser I think I would do it with a lot of foreach loops starting a new loop for every new tag and ending a loop for every end tag(to create some sort of 'levels'). I don't know exactly how, else I would have written it.
#4
@
14 years ago
Perhaps the code for bbCode Lite plugin for bbPress will be of help.
Shortcodes and bbCode are very similar.
#5
@
14 years ago
Maybe...
They stated in shortcode.php that it's based on textpatern:
<?php /** * WordPress API for creating bbcode like tags or what WordPress calls * "shortcodes." The tag and attribute parsing or regular expression code is * based on the Textpattern tag parser.
On the Textpattern site I found this article saying that before version X they also did not support nesting of the same tags, but now they do. Maybe that code is usable?
#6
follow-up:
↓ 7
@
14 years ago
Forgot to say:
url: http://textpattern.com/weblog/318/tag-parser-part-1
No, BBCode Lite is not usable, it's based on regex also.
#7
in reply to:
↑ 6
@
14 years ago
Replying to pepijndevos:
No, BBCode Lite is not usable, it's based on regex also.
So what if it's based on regex? It gets the job done: nested tags.
#8
@
14 years ago
No, it has the same problem as Wordpress has.
A regex either looks for the widest match or the smallest, looking at the code above you can see that both cause trouble.
Explanation: http://kore-nordmann.de/blog/do_NOT_parse_using_regexp.html
You somehow need to match tags on the same nesting level. But as I said, I never wrote a parser so I don't know how.
#9
@
14 years ago
Hints:
- you can use regular expressions to split text into tokens;
- use stack to store open tags (or write everything using recursion - IMO more tricky way).
#10
@
14 years ago
Ok, I think I understand now, but there ar still some blurry parts in my mind on the point where I need to put the content in the matched tags. But I think I'll get around that...
Another problem is that plugins now call do_shortcode recursively, and that I don't know which functions are private or can be changed. This means I can probably write the parser by now, but I have no clue yet how I can get it to fit the current situation.
Maybe it is easier to leave do_shortcode alone and make a new do_recursive_shortcode... or something like that?
It would not brake anyones code and leave the faster alternative for simple tags.
#11
@
14 years ago
Or you could add a 'recursive' parameter to do_shortcode and call _do_shortcode() and _do_recursive_shortcode() internally.
#12
@
11 years ago
Still a valid bug even with the new shortcode regex although the output has changed.
add_shortcode( 'div', 'shortcode_div' ); function shortcode_div( $atts, $content ) { return '<div>' . do_shortcode( $content ) . '</div>'; }
[div][div]test1[/div][div]test2[/div][/div]
<div><div></div>test1</div><div>test2</div>[/div]
The first [/div]
found gets matched with the first [div]
found, resulting in [div]test1
being the $content
and then that stand-alone shortcode getting expanded.
#13
@
11 years ago
- Keywords has-patch needs-testing added; needs-patch removed
Hi there! My name is Konstantin and I love regular expressions (not!). Everybody says we should not parse HTML using a regular expression, and they're right. However, nobody said anything about using two (or more) regular expressions to parse HTML, so let's just do a bunch of loops :)
10702.diff is an attempt to allow proper nested shortcodes in WordPress. The regex pattern has changed to use a positive lookahead and match the opening shortcode tag only if it's followed by its closing tag, meaning the search will match the inner tags first, which is why a loop has been introduced in do_shortcode
and strip_shortcodes
which will loop from the inner tags to the outer tags. This means that preg_replace_callback
will run at least twice if a shortcode is present.
10702.unit-tests.diff adds some test coverage.
10702.performance.csv shows that there's a slight performance hit when the content has a lot of shortcodes. The "no shortcodes" and "some shortcodes" gains are probably due to the strstr( $content, '[' )
short circuit in do_shortcode
. The plugin that ran these tests can be found here.
This is a draft and has room for improvement. Whatever we end up doing in do_shortcode
should be equally done in strip_shortcodes
. Also, escaping shortcodes might look a little weird with that string replacement, but since we're now replacing in a loop several times, replacing [[
with [
will no longer work. If you can think of a smarter way, I'm all ears!
Your feedback is highly appreciated :)
#14
follow-up:
↓ 15
@
11 years ago
FYI:
If you only want to determine if a particular needle occurs within haystack, use the faster and less memory intensive function strpos() instead.
I can't comment on the actual patch though -- regex is scary for me. ;)
#15
in reply to:
↑ 14
@
11 years ago
Replying to Viper007Bond:
Cool, thanks for the heads up! Will address this in the next patch!
#16
follow-up:
↓ 17
@
9 years ago
- Keywords needs-refresh added; has-patch removed
Looking at this, it seems that the thing making it more difficult is allowing the self-closing tags to be identical to an opening tag. It means that our simpler (certainly not simple) regex isn't pairing the right open/close tags. The current behavior is as follows:
[test num="1" expected="matches #6" actual="matches #3"] [test num="2" expected="matches #3" actual="seen as self-closing"]test1[/test num="3" expected="matches #2" actual="matches #1"] [test num="4" expected="matches #5" actual="matches #5"]test2[/test num="5" expected="matches #4" actual="matches #4"] [/test num="6" expected="matches #1" actual="ignored completely"]
After the patch, tags 1 & 6 from above were completely missed, but the internal pairs seemed to match correctly. Having said that, the patch is also 2 years old and I had to apply the changes manually.
#17
in reply to:
↑ 16
@
9 years ago
Replying to aaroncampbell: Yeah, I don't think I like my patch anymore.
#19
@
8 years ago
- Milestone Future Release deleted
- Resolution set to wontfix
- Status changed from new to closed
At the 9 September meeting, participants agreed to close this ticket because fixing it would require unwanted changes to the shortcode syntax, or to parser performance.
For details, please see https://make.wordpress.org/core/2015/09/29/shortcode-roadmap-draft-two/
Well, if you're so dissatisfied, why don't you contribute with a patch?