Make WordPress Core

Opened 13 years ago

Closed 13 years ago

#10473 closed defect (bug) (duplicate)

Shortcode parser in is flawed

Reported by: godfreykfc Owned by:
Milestone: Priority: normal
Severity: major Version: 2.8.1
Component: Shortcodes Keywords:
Focuses: Cc:


There are several critical flaws in the shortcode parser. I'm using 2.8.2. This probably deserves multiple tickets but I think grouping them all here will be easier for the maintainers to follow up (because a single patch that fixes one of the following might break the others):

Problem 1: It doesn't work properly, at all!

Plugin code:

function some_handler($atts, $content){
    return "**BEGIN** Content: '{$content}' **END**";

function my_init() {
    add_shortcode('sc', 'some_handler');


Post content:

2[sc]with content[/sc]2
3[sc /]3
4[sc]with content[/sc]4
5[sc with="attr"][/sc]5
6[sc with="attr"]and content[/sc]6
8multiple line content8

I apologize if this is a bit hard to read, but the idea is every line should be a self contained shortcode (except 7-9 which is one big fat chunk of short code). I'm adding the line number before and after the tag so that you can see clearly what's going on. (I'm not even testing the single line shortcode fail as described in #10082)

Expected HTML output:

1**BEGIN** Content: '' **END**1<br />
2**BEGIN** Content: 'with content' **END**2<br />
3**BEGIN** Content: '' **END**3<br />
4**BEGIN** Content: 'with content' **END**4<br />
5**BEGIN** Content: '' **END**5<br />
4**BEGIN** Content: 'and content' **END**4<br />
7**BEGIN** Content: '7
8multiple line content8
9' **END**9

Should be fairly straight forward. But the ACTUAL HTML output blew my mind...

1**BEGIN** Content: '1<br />
2[sc]with content' **END**2<br />
3**BEGIN** Content: '3<br />
4[sc]with content' **END**4<br />
5**BEGIN** Content: '[/sc]5<br />
6[sc with="attr"]and content' **END**6<br />
7**BEGIN** Content: '7<br />
8multiple line content8<br />
9' **END**9

Problem 1.1:
New lines inside a shortcode should not be converted to <br /> tags, at least when they are entered in the HTML (non-visual) editor. This is perhaps more of an enhancement than defect, but it makes it pretty much impossible to create shortcodes that would take multiple lines content. If it's a desire behavior for that particular shortcode, it should be up to the author to call a function like nl2br to convert those new lines.

Problem 1.2:
Using the same shortcode back-to-back would make the parse spill, even if they are on different lines. For example, the parser picked up the opening tag from the first line and the closing tag from the second line. What is even more weird is that it happens even when the shortcode tag is self-closed (see line 3-4).

Problem 2:
Incorrect types of arguments being passed into the handler function.

function some_handler($atts, $content = null){

function my_init() {
    add_shortcode('sc', 'some_handler');


Post content:



string '' (length=0)
string '' (length=0)

Problem 2.1:
According to the documentation, $content should be null when the tag is self-closed and it can be checked by is_null($content). In reality, it doesn't matter if you use [sc] or [sc /] or [sc][/sc], they would all pass in an empty String instead of null. If this is the desired behavior, the documentation should be updated. (But I believe there are benefits when you can actually distinguish between empty content and no content.)

Problem 2.2:
What makes me stretch my head even more is the behavior when no attributes are given. Intuitively $atts should be an empty array when no arguments are passed in, so that you can loop through it using iterators like foreach() without the fear of it throwing an error. (Or at least it should be null..) However, when no arguments are there an empty string is passed in. This does not make a lot of sense, but if for some reason that is the intended behavior then the documentation should be updated too.

Some of you may argue that it doesn't matter because you are suppose to normalize it using shortcode_atts() before you do anything else, and it seems to handle empty strings just fine. However, this is not always possible. For instance, according to the API, unnamed attributes are supported and they will be passed in as items with numeric keys. However, there aren't any good ways to make shortcode_atts() keep the numeric-keyed items in the array except naming all of them explicitly in the default, i.e. shortcode_atts(array( 0=>'', 1=>'', 2=>''....),$atts);. (And even if you're fine with doing that, you can't really pass that normalized array into extract() because it'd just skip them. This might be considered as a defect of shortcode_atts(), but I am not going to touch that part. So, in order to extract your unnamed attributes, you would have to loop through the array and do your stuff before passing it into shortcode_atts(). So the differences between an empty array and an empty string DOES matter.

Problem 3:

This is by far the most critical flaw of the parser and it does not require much explanation:

function sc_handler($atts, $content = null){
	return "A [sc] shortcode.";

function scalar_handler($atts, $content = null){
	return "A [scalar] shortcode.";

function sc_extend_handler($atts, $content = null){
	return "A [sc-extend] shortcode.";

function my_init() {
  add_shortcode('sc', 'sc_handler');
  add_shortcode('scalar', 'sc_handler');
	add_shortcode('sc-extend', 'sc_extend_handler');


Post content:


HTML output:

A [sc] shortcode.
A [scalar] shortcode.
A [sc] shortcode.

If you make it do a var_dump() on $atts you will have a better idea of what's going on:


string '' (length=0)


string '' (length=0)


  0 => string '-extend' (length=7)

This is obviously very bad. For example, if someone registered a shortcode 'rss', anything that starts with 'rss' and includes a dash ('rss-fetch' or something) would fail. Throughout my testing, shortcode names with dashes seems to be causing a lot of issues (but according to the documentation it's fully supported, it's even used in oen of the examples). I haven't looked into the code yet, but are we even escaping those dashes before sticking it into the pattern?

Change History (5)

#1 @navjotjsingh
13 years ago

I don't know howcome they are still rendering on your site but here on my live site every shortcode is coming unparsed.

#2 @dd32
13 years ago

  • Keywords needs-patch needs-testing added; regex shortcode parser removed
  • Milestone changed from Unassigned to Future Release
  • Priority changed from high to normal
  • Severity changed from critical to major

This would've been far easier managed as separate tickets. They are all separate issues and some easier fixed than others. (the latter 2 for example are easy, the first is very hard due to regular expressions being used and backtrace limits)

I can confirm your first test data example, It might be relitivly easy to ensure that the start-tag of the current shortcode is not in the matched data. As for the new-lines-within-shortcodes issue, I wouldnt hold your breath there, Texturize is applied to the inside of the shortcodes too.

#3 @Denis-de-Bernardy
13 years ago

see #10082, and related tickets

#4 @dd32
13 years ago

yep, #10082 contains many of the issues here.. And some are fixed.. See comment on that ticket on my testing of it shortly.

#5 @dd32
13 years ago

  • Keywords needs-patch needs-testing removed
  • Milestone Future Release deleted
  • Resolution set to duplicate
  • Status changed from new to closed

Issue 1 & 2 are covered by #10082 (even though it has a few bugs)

Issue 3 related to dashes is caused by the use of \b in the regular expression, Which is fixed in a edit to a patch on that ticket i'm about to submit. Closing as a Duplicate of that.

If you've still got any issues with the shortcodes after testing MY patch which i'll add to #10082 shortly, open a new ticket with just that issue.. Its just too hard to see if every fine detail you've mentioned is matched.

Note: See TracTickets for help on using tickets.