WordPress.org

Make WordPress Core

Opened 12 years ago

Closed 6 years ago

Last modified 6 years ago

#6969 closed enhancement (wontfix)

Don't apply wptexturize() to the insides of shortcode tags

Reported by: Viper007Bond Owned by: tellyworth
Milestone: Priority: normal
Severity: normal Version: 2.5.1
Component: Formatting Keywords: needs-patch wptexturize
Focuses: Cc:
PR Number:

Description

I have this for my post contents:

[code lang="php"]$foo = 'bar';[/code]

The problem is my shortcode function gets this passed to it for the content string:

$foo = ‘bar’;

wptexturize() should be smart enough to not format the contents of registered shortcodes. If plugins want their contents formatted, they can pass the contents to the wptexturize() function itself.

Attachments (5)

6969.diff (1.3 KB) - added by DD32 11 years ago.
wp6969_b.diff (2.9 KB) - added by mzizka 11 years ago.
Patch to prevent texturizing of shortcode content
wp6969_c.diff (1.9 KB) - added by mzizka 11 years ago.
Updated patch that covers last mentioned case scenario
test_shortcode_texturize.php (1.3 KB) - added by mzizka 11 years ago.
Updated test case
6969.tests.diff (1.9 KB) - added by ericmann 6 years ago.
Added a test to verify the bug.

Download all attachments as: .zip

Change History (48)

#1 @ryan
12 years ago

  • Owner changed from anonymous to tellyworth

@DD32
11 years ago

#3 @DD32
11 years ago

  • Keywords has-patch needs-testing added; needs-patch removed

attachment 6969.diff added.

  • Needs testing
  • skips inards of registered shortcodes
  • Splitting regex might need more attention.. not too sure.

Test Code:

<?php
$in = '[code lang="php"]
$foo = \'bar\';
[/code]
<code lang="php">$foo = \'bar\'; $foo = \'bar\'</code>';

$GLOBALS['shortcode_tags']['code'] = '';

var_dump(time());
var_dump(wptexturize($in));

Without the 'code' registered:

[code lang="php"]
$foo = &#8216;bar&#8217;;
[/code]
<code lang="php">$foo = 'bar'; $foo = 'bar'</code>

with 'code' registered:

[code lang="php"]
$foo = 'bar';
[/code]
<code lang="php">$foo = 'bar'; $foo = 'bar'</code>

#4 @ryan
11 years ago

  • Milestone changed from 2.7 to 2.8

#5 @JakaJancar
11 years ago

Don't shortcodes use [] only? Why don't you texturize text inside <>?

#6 @DD32
11 years ago

Don't shortcodes use [] only? Why don't you texturize text inside <>?

<...> are HTML tags, So applying texturize to those would break all HTML inside a post (inc. formatting).

If you were to enter < "blah blah" > inside a post in the visual mode, you would have &gt; "blah.. when texturize came to it, So the content inside them IS texturized if entered via the visual mode.

#7 follow-up: @JakaJancar
11 years ago

Right, sorry my bad. I'm still a bit puzzled by this change though.

  • $textarr = preg_split('/(<.*>|\[.*\])/Us', $text, -1, PREG_SPLIT_DELIM_CAPTURE);

+ $textarr = preg_split('/([<\[].*[>\]]|\[.*\])/Us', $text, -1, PREG_SPLIT_DELIM_CAPTURE);

So instead of <> tags and [] tags, now also <] and [> tags are possible? What's the point of that? And why is the second part of the new regexp even needed ( \[.*\] ), shouldn't these tags all be matched by the first part anyways ( [<\[].*[>\]] ).

I'm not looking at the bigger picture here, so if I'm not making any sense at all, just ignore me =)

#8 @JakaJancar
11 years ago

Gah... please click Reply and look at the text there, unformatted.

#9 in reply to: ↑ 7 @DD32
11 years ago

  • Keywords needs-patch added; has-patch removed

Replying to JakaJancar:

So instead of <> tags and [] tags, now also <] and [> tags are possible? What's the point of that? And why is the second part of the new regexp even needed ( \[.*\] ), shouldn't these tags all be matched by the first part anyways ( [<\[].*[>\]] ).

Hm.. Odd.. Why didnt i notice that at that the time.

Eitherway, This is currently broken then, Because while the code seems to read that it was to exclude shortcodes, In reality, it doesnt work. (Probably because splitting like that doesnt work as expected)

#10 @mzizka
11 years ago

I am working on a patch for this. I am at the point where I have the texturizing fixed, and am now working on wpautop (which is totally dysfunctional with any but the most basic use of shortcodes).

Excluding the shortcode content is fairly complex work and will likely slow things down a bit.

This leads me to wonder why these filters are applied before the shortcodes are handled. There could be security reasons -- I admit I didn't really dig into this. From my point of view, it seems we are making things needlessly complicated. Shortcodes are meant to alter or produce output, and it is that on that output that filters should be applied -- not on input. No?

@mzizka
11 years ago

Patch to prevent texturizing of shortcode content

#11 @mzizka
11 years ago

The above patch does not include wpautop, but it's a start.

#12 @mzizka
11 years ago

The above test case is pretty basic, but should do the job. It is not integrated into the WordPress tests, as I do not have time at the moment to get familiar with those.

#13 @mzizka
11 years ago

  • Component changed from General to Formatting
  • Keywords has-patch added; needs-patch removed

#14 follow-up: @Denis-de-Bernardy
11 years ago

A few thoughts:

  • I'm scratching my head on the extra parameter in the get_shortcode_regexp(), when using the original one might be just as good.
  • the start regexp is not the same as the beginning of the real regex
  • the idea of changing the actual regexp using str_replace() seems a little weird. why not make it the first element and re-assign things as needed later on?

also, I foresee potential bugs or tough choices due to the fact that shortcodes can now be escaped like this:

[[shortcode]]

because of the escaping regex, the current patch seems potentially buggy:

foo[shortcode]
"bar"
[/shortcode]foo

#16 in reply to: ↑ 14 ; follow-up: @mzizka
11 years ago

Replying to Denis-de-Bernardy:

  • I'm scratching my head on the extra parameter in the get_shortcode_regexp(), when using the original one might be just as good.

This is just an optimization. Feel free to ditch it.

  • the start regexp is not the same as the beginning of the real regex

Not sure what you mean exactly.

  • the idea of changing the actual regexp using str_replace() seems a little weird. why not make it the first element and re-assign things as needed later on?

I agree, but I could not find another decent way to do it.

also, I foresee potential bugs or tough choices due to the fact that shortcodes can now be escaped like this:

[[shortcode]]

because of the escaping regex, the current patch seems potentially buggy:

foo[shortcode]
"bar"
[/shortcode]foo

We need add test cases for this and fix it.

I continue to think that shortcodes should be treated before wptexturize and wpautop. In my mind, it makes no sense to do all that formatting work before shortcodes are treated. It's presumptuous -- why should we presume shortcode content is meant to be displayed? In fact, it most likely isn't -- it's meant to be interpreted, transformed, etc. The output of the shortcode is what should be formatted.

#17 @mzizka
11 years ago

Ah man... should have previous that. Sorry.

#18 @mzizka
11 years ago

Previewed... d'oh! Too much coffee this morning.

#19 in reply to: ↑ 16 ; follow-up: @Denis-de-Bernardy
11 years ago

Replying to mzizka:

Replying to Denis-de-Bernardy:

  • the start regexp is not the same as the beginning of the real regex

Not sure what you mean exactly.

your start regexp doesn't start with (.?) like the real one. it's potentially problematic in the second use-case I outlined.

#20 in reply to: ↑ 19 @mzizka
11 years ago

your start regexp doesn't start with (.?) like the real one. it's potentially problematic in the second use-case I outlined.

I was not familiar with the shortcode escaping feature. I should have read the comments more closely. I do see your point now. This needs a fix, but the existing code is already broken in this regard anyway. All the full regexp will do is capture an extra character on either side of the shortcode. In our case we would want to NOT consider escaped shortcodes when splitting the text. This should not be difficult, but require a different regular expression from the "standard" one.

Regarding the str_replace on the regexp, I'd like to bring a little more precision. The reason why I do this is because the entire regular expression has to be captured in parentheses (so that the full delimiters are all captured by preg_split). This will increment by 1 any capturing index in the original shortcode regular expression, so I have to fix the backreferencing used therein. Short of refactoring the entire function, this was the only fix I could come up with. The other solution is to not call the function, and construct another regular expression locally. I think this is the path I will opt for.

I will change my patch and add some test cases for the case scenarios you outlined (as well as a few others) when time allows.

(However, I would still like to know why formatting is applied before processing of shortcodes. I know I'm annoying with this, but I'm having a hard time with the idea.)

#21 @mzizka
11 years ago

I've updated a new and somewhat simplified patch (though it does not call the shortcode regexp function, so no code reuse). I've updated my test case with the escaping. I can see it failing on fringe cases like

[[shortcode]

(i.e., two opening brackets and only one closing bracket), but I personally could live with that.

@mzizka
11 years ago

Updated patch that covers last mentioned case scenario

@mzizka
11 years ago

Updated test case

#22 @mzizka
11 years ago

Previous patch version did not take into account the following scenario:

foo[shortcode]
"bar"
[/shortcode]foo

This is fixed now (test added).

#23 @Denis-de-Bernardy
11 years ago

haven't tested, since there are unit tests, but +1 to the current patch.

#24 @Denis-de-Bernardy
11 years ago

  • Keywords reporter-feedback 2nd-opinion dev-feedback added

Another possibility, for this and #6984, could be to make shortcodes occur before autop and texturize. I take it there were reasons for shortcodes to occur after them, but I can't seem to remember what it is... Maybe someone else does?

#25 @Viper007Bond
11 years ago

I don't remember the exact reasoning, but it was discussed at extreme length. There was even the ability to chose when your shortcode ran (before or after wpautop).

In short, it's how it is now for a reason and it should not be moved, especially since so many plugins expect it to be where it is now.

#26 @Denis-de-Bernardy
11 years ago

  • Keywords reporter-feedback 2nd-opinion dev-feedback removed

#27 @Denis-de-Bernardy
11 years ago

  • Keywords needs-patch 2nd-opinion dev-feedback added; has-patch needs-testing removed
  • Milestone changed from 2.8 to Future Release

@mzizka: here are a few nasty tests case to play with:

foo[caption id="attachment_31" align="alignleft" width="215" caption="&quot;foo&quot;"]<img class="size-full wp-image-31" title="test2" src="http://localhost/~denis/wp/wp-content/uploads/2009/04/test2.jpg" alt="&quot;foo&quot;" width="215" height="300" />"bar&quot;[/caption]bar

currently outputs (with your patch):

<p>foo<div id="attachment_31" class="wp-caption alignleft" style="width: 225px"><img class="size-full wp-image-31" title="test2" src="http://localhost/~denis/wp/wp-content/uploads/2009/04/test2.jpg" alt="&quot;foo&quot;" width="215" height="300" />"bar&quot;<p class="wp-caption-text">&quot;foo&quot;</p></div>bar</p>

expected, either of:

<p>foo<div id="attachment_31" class="wp-caption alignleft" style="width: 225px"><img class="size-full wp-image-31" title="test2" src="http://localhost/~denis/wp/wp-content/uploads/2009/04/test2.jpg" alt="&quot;foo&quot;" width="215" height="300" />&#8221;bar&#8243;<p class="wp-caption-text">&quot;foo&quot;</p></div>bar</p>

<p>foo<div id="attachment_31" class="wp-caption alignleft" style="width: 225px"><img class="size-full wp-image-31" title="test2" src="http://localhost/~denis/wp/wp-content/uploads/2009/04/test2.jpg" alt="&quot;foo&quot;" width="215" height="300" />"bar&quot;<p class="wp-caption-text">&quot;foo&quot;</p></div>bar</p>

the same shortcode, but this time escaped:

[[caption id="attachment_31" align="alignleft" width="215" caption="&quot;foo&quot;"]<img class="size-full wp-image-31" title="test2" src="http://localhost/~denis/wp/wp-content/uploads/2009/04/test2.jpg" alt="&quot;foo&quot;" width="215" height="300" />"bar&quot;[/caption]]

currently outputs:

[caption id=&#8221;attachment_31&#8243; align=&#8221;alignleft&#8221; width=&#8221;215&#8243; caption=&#8221;&quot;foo&quot;&#8221;]<img class="size-full wp-image-31" title="test2" src="http://localhost/~denis/wp/wp-content/uploads/2009/04/test2.jpg" alt="&quot;foo&quot;" width="215" height="300" />&#8220;bar&#8221;[/caption]

expected, either of:

[caption id=&quot;attachment_31&quot; align=&quot;alignleft&quot; width=&quot;215&quot; caption=&quot;&quot;foo&quot;&quot;]<img class="size-full wp-image-31" title="test2" src="http://localhost/~denis/wp/wp-content/uploads/2009/04/test2.jpg" alt="&quot;foo&quot;" width="215" height="300" />&quot;bar&amp;quot;[/caption]

[caption id=&quot;attachment_31&quot; align=&quot;alignleft&quot; width=&quot;215&quot; caption=&quot;&quot;foo&quot;&quot;]<img class="size-full wp-image-31" title="test2" src="http://localhost/~denis/wp/wp-content/uploads/2009/04/test2.jpg" alt="&quot;foo&quot;" width="215" height="300" />"bar&quot;[/caption]

itching to close this as invalid or wontfix, myself. or then, we'd want to add some kind of special attribute that *tells* texturize what to do.

the reason is, in above, the outlined expected results could all be expected. I used the caption shortcode from lack of a plugin around that used shortcodes, but:

  • on occasion, you'll actually want the texturizing
  • on occasion, you're writing php code and don't want it to change a thing
  • on occasion, you're writing about shortcodes, using the escape method that was recently implemented, and you expect *some* texturizing to happen (as is done in code block). in this case the shortcode arguably gets nested into a <pre> or <code> block (haven't tried that)

moving this to future pending dev feedback

#28 @Denis-de-Bernardy
11 years ago

oh, and there might arguably be related bugs in the escape method that needs to be fixed, too. specifically, if I write about shortcodes, in something like:

<pre>
[[php foo="bar"]
echo "foo" . 'bar';
[/php]]
</pre>

then the ultimate, expected, final result arguably is:

<pre>
[php foo=&quot;bar&quot;]
echo &quot;foo&quot; . 'bar';
[/php]
</pre>

#30 @Denis-de-Bernardy
11 years ago

  • Keywords needs-unit-tests added; 2nd-opinion dev-feedback removed

#31 @ryan
11 years ago

(In [11345]) Handle nested tag in wptexturize(). Props nbachiyski. fixes #7056 see #6969

#32 @Denis-de-Bernardy
11 years ago

  • Milestone changed from Future Release to 2.9

#33 @dd32
10 years ago

Closed #10548 as duplicate of this

#34 @ryan
10 years ago

  • Milestone changed from 2.9 to 3.0

#35 @nacin
10 years ago

  • Milestone changed from 3.0 to Future Release

No patch, unit tests, agreement or traction.

#37 @norbertm
9 years ago

Added test for

[code lang="php"]$foo = 'bar';[/code]

to #4539.

#38 in reply to: ↑ 36 @ciantic
9 years ago

Replying to Viper007Bond:

Work-around for plugin authors BTW:

http://www.viper007bond.com/2009/11/22/wordpress-code-earlier-shortcodes/

This workaround seems really solid, though I'd like to chip in of the idea including third attribute for priority in add_shortcode like this:

  add_shortcode('foo', 'foo_shortcode_func', 7);

Where "7" is the priority as mentioned in viper's blog, check the workaround to understand.

#39 @nacin
6 years ago

  • Keywords wptexturize added

@ericmann
6 years ago

Added a test to verify the bug.

#40 @ericmann
6 years ago

  • Keywords needs-unit-tests removed

Added a robust, easy-to-read unit test at WordCamp Phoenix

This ticket was mentioned in IRC in #wordpress-dev by ericmann. View the logs.


6 years ago

#42 @miqrogroove
6 years ago

  • Resolution set to wontfix
  • Status changed from new to closed

As of version 2.8.0, there is a no_texturize_shortcodes filter that prevents wptexturize() from touching the text between specified code tags. Several patches are being added to 4.0 that will also fix bugs in the wptexturize() parsing of shortcodes.

It is therefore up to the author of the shortcode plugin whether or not to allow wptexturize() to modify shortcode content. There is no reason to change the default behavior for all shortcodes.

#43 @DrewAPicture
6 years ago

  • Milestone Future Release deleted
Note: See TracTickets for help on using tickets.