Make WordPress Core

Opened 14 years ago

Closed 13 years ago

#15600 closed defect (bug) (fixed)

shortcode_unautop returns emtpy string

Reported by: mdbitz's profile mdbitz Owned by: westi's profile westi
Milestone: 3.3 Priority: high
Severity: major Version: 3.0.1
Component: Formatting Keywords: needs-testing has-patch needs-unit-tests
Focuses: Cc:

Description (last modified by ocean90)

In some cases when a website uses plugins with ShortCode the shortcode_autop method will return an empty string.

Below is sample content that gets an empty string returned when passed through shortcode_autop.


<p>Writers are a diverse bunch. Quirky. Independent. Opinionated. In other words, hard to buy for. If you are trapped in a relationship with a writer,<em> get out</em>. If you can&#8217;t, attempt to tame your writer with the liberal application of gifts they will appreciate. Here I have gathered a few ideas that range from classic to new-fangled. Each is sure to melt almost any writer&#8217;s dark, cold heart.</p>
<p>[amazon_link id="0547055323" target="_blank" ]<img src="http://ecx.images-amazon.com/images/I/417-fWjfIYL._SL160_.jpg" alt="The Best American Short Stories 2010 (The Best American Series (R))" />[/amazon_link] What writer doesn&#8217;t want to have their work end up in a &#8220;best&#8221; collection? Yes, even the ones who scoff at commercialism and popular opinion do. The Best American Short Stories series has been running for a good long time but the collections are kept fresh with yearly guest editors. My favorite feature of these books are the writer&#8217;s notes &#8212; insightful comments from the writers themselves about their stories. Though the notes<br />

were more essential in the days before the Internet provided instantaneous transmission of a writer&#8217;s every 140-character-long thought, they still provide a worthwhile virtual mini-panel on the state of the art in short fiction. If your writer leans more toward genre fiction, check out other collections such as the venerable [amazon_link id="0312608977" target="_blank" ] Year&#8217;s Best Science Fiction[/amazon_link] .</p>
<p>[amazon_link id="B002FQJT3Q" target="_blank" ]<img src="http://ecx.images-amazon.com/images/I/417XQ0XwQuL._SL160_.jpg" alt="Kindle 3G Wireless Reading Device, Free 3G + Wi-Fi, 6&quot; Display, Graphite, 3G Works Globally - Latest Generation" />[/amazon_link] Digital publishing is an unstoppable freight train charging towards us. Get on board or get squashed to jelly beneath the steel wheels of progress. The Kindle is probably the cheapest and most hassle-free way to hop on board and see what all the fuss is about. The screen is a pleasure to read in bright light, and the smaller size Kindle adds nothing to the weight or bulk of your messenger bag. There are a few quirks. You can&#8217;t read in bad light (just like a book) &#8212; the screen does not produce any light on it&#8217;s own. The smaller Kindle is really too small to effectively display PDFs. If your writer spends a majority of their time squinting at screenplay PDFs on their laptop, then go for the [amazon_link id="B002GYWHSQ" target="_blank" ]Kindle DX[/amazon_link] .</p>
<p>[amazon_link id="B00138CX30" target="_blank" ]<img src="http://ecx.images-amazon.com/images/I/51lwiSZcWcL._SL160_.jpg" alt="Time Out Of Mind" />[/amazon_link] Music to write by. Normally, writing music should be instrumental only &#8212; can&#8217;t have too many words competing for brainspace. But there is something very atmospheric about the tracks on Bob Dylan&#8217;s [amazon_link id="B00138CX30" target="_blank" ] Time Out of Mind[/amazon_link] and the vocals are subdued if plaintive. The track &#8220;Not Dark Yet&#8221; was featured in the ultimate writer movie, [amazon_link id="B001ZS5CD6" target="_blank" ]Wonder Boys[/amazon_link] If your writer&#8217;s work is filled with angst and wounded characters, this is the record to spin. For a more cinematic, epic, sweeping soundtrack, try [amazon_link id="B003DUA56S" target="_blank" ]Clash of the Titans[/amazon_link] . Terrible movie, good generic action-movie score.</p>

<p>[amazon_link id="8883701127" target="_blank" ]<img src="http://ecx.images-amazon.com/images/I/41nEt%2BZMYsL._SL160_.jpg" alt="Moleskine Ruled Notebook Large" />[/amazon_link] Unassuming and built to last &#8212; we may live in the digital age, but when paper is called for the Moleskine notebook provides hundreds of durably bound, creamy off-white pages just asking to be filled with Great Ideas. The spine can be bent clear back on itself without breaking the binding or loosening a single page. The covers are stiff but flexible cardboard and covered with water-resistant oilcloth. Not flashy, but definitely classy. There&#8217;s a reason these things are almost a writer&#8217;s cliche.</p>
<p>[amazon_link id="B000GTR2F6" target="_blank" ]<img src="http://ecx.images-amazon.com/images/I/41GSZCYWKYL._SL160_.jpg" alt="Keurig B-70 B70 Platinum Single-Cup Home Brewing System" />[/amazon_link] Most writers juggle multiple hats. Writing often has to wait to the very end of the day. That, or it has to come first thing in the morning, long before any reasonable person would think of getting out of bed. It&#8217;s no wonder many writers, especially those first starting out, require a constant energy boost. I keep the [amazon_link id="B000GTR2F6" target="_blank" ]Keurig Platinum [/amazon_link] on my desk. It&#8217;s an essential piece of equipment, right alongside my printer and external hard drive. Making a fresh cup of coffee takes about one minute. Thanks to the self-contained filter cups, there is no mess to clean up. At all. Also good for tea, iced coffee, and provides instant hot water for hot cocoa or soup mix.</p>
<p>[amazon_link id="B001TIJWKQ" target="_blank" ]<img src="http://ecx.images-amazon.com/images/I/41nsx4792FL._SL160_.jpg" alt="Crumpler The Considerable Embarrassment Laptop Bag, Brown/Red" />[/amazon_link] And where is your writer supposed to put all the new awesome stuff you buy them? A briefcase is far too corporate and a backpack is far too Kindergarten. A writer has one foot in both these worlds and so does Crumpler. Witness the honestly named [amazon_link id="B001TIJWKQ" target="_blank" ]Considerable Embarrassment Laptop Bag[/amazon_link] . This bag holds a 15&#8243; laptop with room left over for books, magazines, moleskins and other writerly essentials. Or step up to the even roomier [amazon_link id="B001TN6S8K" target="_blank" ]Dreadful Embarassment[/amazon_link] , meant<br />

for 17&#8243; laptops (always buy one size larger bag than you think you need). I&#8217;ve had a Crumpler camera bag for a couple of years and it&#8217;s proven to be tough, well-built and rock-solid.</p>
<p>Of course, if you really love your writer you might consider gifting them with a Crumpler bag stuffed with <em>all</em> the items mentioned here. Your writer may be confused by this strange new emotion called <em>joy</em>, but it would only be a matter of time before the Great American Novel/Screenplay/Haiku is produced.</p>
<p>Of course, if you really love your writer you might consider gifting them with a Crumpler bag stuffed with <em>all</em> the items mentioned here. Your writer may be confused by this strange new emotion called <em>joy</em>, but it would only be a matter of time before the Great American Novel/Screenplay/Haiku is produced.</p>

Attachments (8)

15600.diff (643 bytes) - added by mdawaffe 14 years ago.
15600.2.diff (5.1 KB) - added by mdawaffe 14 years ago.
shortcodes-tests.tar.gz (69.8 KB) - added by mdawaffe 14 years ago.
Test case for 15600.2.diff
shortcode.patch.output (5.5 KB) - added by westi 14 years ago.
Output of the tests with the patch applied
shortcode.nopatch.output (1.9 KB) - added by westi 14 years ago.
Output of the tests without the patch applied
15600.3.diff (6.2 KB) - added by mdawaffe 14 years ago.
15600.5.diff (6.3 KB) - added by mdawaffe 13 years ago.
back compat
15600.6.diff (5.8 KB) - added by mdawaffe 13 years ago.

Download all attachments as: .zip

Change History (47)

#1 @mdbitz
14 years ago

  • Summary changed from shortcode_autop returns emtpy string to shortcode_unautop returns emtpy string

Should have copy pasted the name, i mean the shortcode_unautop function.

#2 @mdbitz
14 years ago

Also it may be useful to know that in the above sample I am using the WordPress-Amazon-Associate plugin. If i remove the filter shortcode_unautop all ShortCode is rendered correctly, and if not the empty string is returned for content.

#3 @ocean90
14 years ago

  • Description modified (diff)
  • Severity changed from major to normal

#4 @westi
14 years ago

  • Cc westi added
  • Milestone changed from Awaiting Review to 3.1
  • Severity changed from normal to major

This should be investigated.

We shouldn't be eating all the content like that.

#5 @markjaquith
14 years ago

  • Keywords reporter-feedback added

Cannot duplicate. Installed the WordPress Amazon Associate plugin and configured it. Pasted exactly what you said you had into the HTML editor and published. Post shows up fine, with Amazon images.

#6 follow-up: @mdawaffe
14 years ago

I can reproduce this on a site with 90 shortcodes installed a large post with 62 captioned images (using the caption shortcode) with interstitial HTML between each image.

In fact, if I remove all but the caption shortcode from shortcode_unautop(), I can still reproduce.

I get PREG_BACKTRACK_LIMIT_ERROR.

Making preg operator matching the shortcode's content possesive solves this for my test case. Patch attached.

@mdawaffe
14 years ago

#7 @mdawaffe
14 years ago

  • Keywords needs-testing has-patch needs-unit-tests added

I don't see any tests for shortcode_unautop(), and my testing was pretty limited.

#8 in reply to: ↑ 6 @mdawaffe
14 years ago

Replying to mdawaffe:

I can reproduce this

I can reproduce the bug, but apparently I can't write. Please excuse the typos and horrible grammar :)

#9 @markjaquith
14 years ago

  • Milestone changed from 3.1 to Future Release
  • Priority changed from normal to high

Nice work mdawaffe! But I think we're going to have to punt this to 3.2. It's not new, it's late in the cycle, and we need time to make a test for this. It would be helpful if we had an example post and ini_set() combo (for setting the backtrack limit to an agreed minimum supported setting). mdawaffe — can you ini_get() your backtrack limit and provide a Lipsum version of the problem post?

#10 @prettyboymp
14 years ago

I've been able to reproduce the same error if a shortcode is the first non-whitespace on a new line non-shortcode content on the same line. In this case, if the post exceeds the backtrack limit, the result is NULL due to the PREG_BACKTRACK_LIMIT_ERROR. Example:

[foo]shortcode content[/foo] the rest of the paragraph.

//40,000+ characters of lorem ipsum text.

#11 @dshafik
14 years ago

  • Keywords changed from shortcode_autop, ShortCode reporter-feedback needs-testing has-patch needs-unit-tests to shortcode_autop ShortCode reporter-feedback needs-testing has-patch needs-unit-tests

The problem with this setting is that it's not based on the length of the input, it's based on how the pattern is written against the input. Having used it for shortcode-type parsing, there is no one-true-answer for this problem.

You can chunk the content, but you might split a tag pair over two chunks.

The only solution I found was to parse the string as a stream of text using strpos/substring.

#12 @ocean90
14 years ago

  • Keywords shortcode_autop ShortCode reporter-feedback removed

#13 @aaroncampbell
14 years ago

Related: #8553

The site where I first ran into the problem was using a lot of shortcodes with no content and no parameters, so on the longer posts I just prevented shortcodes from processing and processed them with str_replace instead. Obviously not something that works everywhere, but I've been able to use the same basic approach on several sites now.

#14 @masterleep
14 years ago

This bug is serious. It should be renamed "wordpress randomly gobbles your post if you use lots of shortcodes" or "shortcodes are useless if you need more than a few in a post".

I ran into it on a relatively short post (6600 bytes) that contained 16 shortcodes where the start of the post content was a shortcode expression.

#15 @masterleep
14 years ago

Although not a complete solution, it would be far better to return the original unfiltered text from shortcode_unautop if preg_replace errors out, instead of the empty string.

Last edited 14 years ago by masterleep (previous) (diff)

#16 @masterleep
14 years ago

Changing shortcode_unautop to the following fixes the empty post problem, although some unwanted <p> tags may be left in that case.

function shortcode_unautop($pee) {
	global $shortcode_tags;

	$pee_orig = $pee;
	if ( !empty($shortcode_tags) && is_array($shortcode_tags) ) {
		$tagnames = array_keys($shortcode_tags);
		$tagregexp = join( '|', array_map('preg_quote', $tagnames) );
		$pee = preg_replace('/<p>\\s*?(\\[(' . $tagregexp . ')\\b.*?\\/?\\](?:.+?\\[\\/\\2\\])?)\\s*<\\/p>/s', '$1', $pee);
		if ($pee == NULL)
			return $pee_orig;
	}

	return $pee;
}

#17 @mdawaffe
14 years ago

  • Cc mdawaffe added

Here's a much better patch than my original.

Both do_shortcode() and shortcode_unautop() can run into PREG_BACKTRACK_LIMIT_ERRORs for large inputs.

Attached drastically reduces the execution time and backtracing for both functions.

  1. Constant prefix in get_shortcode_regex().
  2. Unroll the Loop to more efficiently grab the shortcode attributes.
  3. Only look for a closing shortcode tag when the shortcode is not self closing.
  4. Comments for the regular expression.

We could also similarly "Unroll the Loop" for the content between the opening and closing shortcode tags, but I *did not test* this patch against shortcodes with inner content, so I don't know if the patch even works, let alone if unrolling the loop there would help.

I've also attached a some test scripts for the diff.

Untar the tests into WordPress' ABSPATH, then run: ./shortcodes-tests.sh 15600.2.diff

WARNING: the test script reverts your checkout!

My results for the tests:

$ ./shortcodes-tests.sh 15600.2.diff 
Reverted 'wp-includes/formatting.php'
Reverted 'wp-includes/shortcodes.php'
shortcode_unautop() Tests: original
SHORT TIME: 0.0088322162628174
LONG  TIME: 1.3537223339081
SHORT BACKTRACE REQUIREMENT: 232
LONG  BACKTRACE REQUIREMENT: 135670

do_shortcode() Tests: original
SHORT TIME: 0.027733564376831
LONG  TIME: 2.1369636058807
SHORT BACKTRACE REQUIREMENT: 420
LONG  BACKTRACE REQUIREMENT: 135858

patching file wp-includes/shortcodes.php
patching file wp-includes/formatting.php
shortcode_unautop() Tests: patched
SHORT TIME: 0.0067722797393799
LONG  TIME: 0.13501048088074
SHORT BACKTRACE REQUIREMENT: 15
LONG  BACKTRACE REQUIREMENT: 15

do_shortcode() Tests: patched
SHORT TIME: 0.021802425384521
LONG  TIME: 0.14546251296997
SHORT BACKTRACE REQUIREMENT: 16
LONG  BACKTRACE REQUIREMENT: 16

DIFFS:

The lack of any content after the "DIFFS:" line is good: it means there's no change in the outputs of the functions.

Notice that, with respect to the input length, the time required scales much better, and the amount of backtracing is constant.

My tests are pretty limited, though: just a couple simple cases embedded in content of varying length.

@mdawaffe
14 years ago

@mdawaffe
14 years ago

Test case for 15600.2.diff

#18 @westi
14 years ago

Nice patch.

But it increases the number of failures in the shortcode tests here: http://unit-tests.trac.wordpress.org/browser/wp-testcase/test_shortcode.php

Before: Tests: 25, Assertions: 49, Failures: 2, Skipped: 4.

After: Tests: 25, Assertions: 38, Failures: 12, Skipped: 4.

Attaching output here too.

@westi
14 years ago

Output of the tests with the patch applied

@westi
14 years ago

Output of the tests without the patch applied

#19 @mdawaffe
14 years ago

Thanks - I'll take a look.

#20 @mdawaffe
14 years ago

Thanks for pointing me at the test cases.

15600.3.diff now passes the same tests as r17834.

I tested with

php wp-test.php -t WPTestShortcodeOutput1,WPTestShortcodeOutputParagraph,TestShortcode,TestShortcodeStripping

Still a significant speedup.

$ ./shortcodes-tests.sh 15600.3.diff
Reverted 'wp-includes/formatting.php'
Reverted 'wp-includes/shortcodes.php'
shortcode_unautop() Tests: original
SHORT TIME: 0.0092761516571045
LONG  TIME: 1.3933248519897
SHORT BACKTRACE REQUIREMENT: 232
LONG  BACKTRACE REQUIREMENT: 135670

do_shortcode() Tests: original
SHORT TIME: 0.029265403747559
LONG  TIME: 2.1743712425232
SHORT BACKTRACE REQUIREMENT: 420
LONG  BACKTRACE REQUIREMENT: 135858

patching file wp-includes/shortcodes.php
patching file wp-includes/formatting.php
shortcode_unautop() Tests: patched
SHORT TIME: 0.0078203678131104
LONG  TIME: 0.38498711585999
SHORT BACKTRACE REQUIREMENT: 42
LONG  BACKTRACE REQUIREMENT: 46

do_shortcode() Tests: patched
SHORT TIME: 0.024780035018921
LONG  TIME: 0.39810061454773
SHORT BACKTRACE REQUIREMENT: 47
LONG  BACKTRACE REQUIREMENT: 51

DIFFS:
Last edited 14 years ago by mdawaffe (previous) (diff)

@mdawaffe
14 years ago

#21 @ocean90
14 years ago

  • Milestone changed from Future Release to 3.2

This should be a 3.2 candidate. Nice work mdawaffe.

#22 @westi
14 years ago

  • Owner set to westi
  • Status changed from new to reviewing

Thanks mike!

#23 @westi
14 years ago

  • Keywords 3.3-early westi-likes added
  • Milestone changed from 3.2 to Future Release

It is too late to go into 3.2 now.

Marking for 3.3

#24 @masterleep
14 years ago

If it's too late for the more complete solution, could you take the patch in comment 16 for 3.2? It is low risk. Essentially it just checks for a NULL return result and returns the original post content rather than erasing it.

This problem requires me to hand patch Wordpress to keep my posts from being randomly replaced with empty content, so it's really a nuisance.

#25 @masterleep
13 years ago

This bug has gone silent but it's still a serious problem with two patches available to be applied. Is it going to be fixed in 3.3?

#26 @ocean90
13 years ago

  • Keywords 3.3-early westi-likes removed
  • Milestone changed from Future Release to 3.3

#27 @masterleep
13 years ago

Does this bug deserve the needs-testing and needs-unit-test keywords since shortcodes have unit tests?

#28 follow-up: @nacin
13 years ago

mdawaffe, westi, or ryan, are you comfortable with 15600.3.diff on WP.com?

#29 in reply to: ↑ 28 ; follow-up: @mdawaffe
13 years ago

Replying to nacin:

mdawaffe, westi, or ryan, are you comfortable with 15600.3.diff on WP.com?

It's not running on WP.com yet. I thought it was. I'm looking for my test cases (so far I've completely lost them), and I'll hook it up.

#30 follow-up: @mdawaffe
13 years ago

So far, no problems on WP.com.

The patch changes the match grouping, and some plugins are using the function: http://www.google.com/search?gcx=c&ix=c1&sourceid=chrome&ie=UTF-8&q=get_shortcode_regex+site%3Aplugins.svn.wordpress.org

Could someone grep through the plugins repo and see what files are affected? My guess is that few if any of the plugins will actually be affected.

#31 in reply to: ↑ 30 @duck_
13 years ago

Replying to mdawaffe:

Could someone grep through the plugins repo and see what files are affected? My guess is that few if any of the plugins will actually be affected.

http://pastie.org/pastes/2522893/text

I quickly split it into two parts. The bottom shows plugins that seem to re-implement the function themselves or include it via backpress.

@mdawaffe
13 years ago

back compat

#32 @mdawaffe
13 years ago

OK - I lied :) There were a few issues on WP.com with back compat. Also, a bunch of the plugins in http://pastie.org/pastes/2522893/text (thanks, duck_) use the matching groups in ways I didn't expect.

So here's a new patch.

It's the same as 15600.3.diff with the following differences:

  • All matching groups are the same as in the current code.
  • The regex no longer requires /x to compile.

I believe the patch is now 100% backward compatible.

#33 @Viper007Bond
13 years ago

I'm one of the ones calling get_shortcode_regex() because my plugin needs to use do_shortcode() with a different callback function (one that keeps double brackets). If only do_shortcode() accepted a callback function parameter... ;)

#34 in reply to: ↑ 29 ; follow-up: @westi
13 years ago

Replying to mdawaffe:

Replying to nacin:

mdawaffe, westi, or ryan, are you comfortable with 15600.3.diff on WP.com?

It's not running on WP.com yet. I thought it was. I'm looking for my test cases (so far I've completely lost them), and I'll hook it up.

You attached your tests to this ticket I believe.

#35 in reply to: ↑ 34 ; follow-up: @mdawaffe
13 years ago

Replying to westi:

You attached your tests to this ticket I believe.

Yeah - I reran those tests. I used to have some real-world data as well for which the current code failed.

15600.5.diff has been running on WP.com since I uploaded the patch here with no (known to me) problems.

@mdawaffe
13 years ago

#36 in reply to: ↑ 35 @mdawaffe
13 years ago

Replying to mdawaffe:

15600.5.diff has been running on WP.com since I uploaded the patch here with no (known to me) problems.

We found a problem :)

15600.6.diff is a more forgiving regex - it's more generous in what can match as the shortcode attributes. It reproduces the behavior of the current code, in that regard: allow shortcode_parse_atts() to do the attribute parsing.

It's very slightly slower than 15600.5.diff, but has better backtracing performance. It's still much faster than the current code.

$ ./shortcodes-tests.sh ~/15600.6.diff 
Reverted 'wp-includes/formatting.php'
Reverted 'wp-includes/shortcodes.php'
shortcode_unautop() Tests: original
SHORT TIME: 0.0095028877258301
LONG  TIME: 3.7883765697479
SHORT BACKTRACE REQUIREMENT: 232
LONG  BACKTRACE REQUIREMENT: 135670

do_shortcode() Tests: original
SHORT TIME: 0.029194116592407
LONG  TIME: 19.19673538208
SHORT BACKTRACE REQUIREMENT: 420
LONG  BACKTRACE REQUIREMENT: 135858

patching file wp-includes/shortcodes.php
patching file wp-includes/formatting.php
shortcode_unautop() Tests: patched
SHORT TIME: 0.0085659027099609
LONG  TIME: 0.39086127281189
SHORT BACKTRACE REQUIREMENT: 33
LONG  BACKTRACE REQUIREMENT: 37

do_shortcode() Tests: patched
SHORT TIME: 0.026137113571167
LONG  TIME: 0.41313719749451
SHORT BACKTRACE REQUIREMENT: 42
LONG  BACKTRACE REQUIREMENT: 46

DIFFS:

#37 @masterleep
13 years ago

Did this make it into the 3.3 beta?

#38 @dd32
13 years ago

Did this make it into the 3.3 beta?

No commits have been made in this ticket, so no.

#39 @ryan
13 years ago

  • Resolution set to fixed
  • Status changed from reviewing to closed

In [18952]:

Avoid preg backtrack limit errors with large posts when processing shortcodes. Props mdawaffe. fixes #15600

Note: See TracTickets for help on using tickets.