Opened 14 years ago
Closed 13 years ago
#15600 closed defect (bug) (fixed)
shortcode_unautop returns emtpy string
Reported by: | mdbitz | Owned by: | westi |
---|---|---|---|
Milestone: | 3.3 | Priority: | high |
Severity: | major | Version: | 3.0.1 |
Component: | Formatting | Keywords: | needs-testing has-patch needs-unit-tests |
Focuses: | Cc: |
Description (last modified by )
In some cases when a website uses plugins with ShortCode the shortcode_autop
method will return an empty string.
Below is sample content that gets an empty string returned when passed through shortcode_autop
.
<p>Writers are a diverse bunch. Quirky. Independent. Opinionated. In other words, hard to buy for. If you are trapped in a relationship with a writer,<em> get out</em>. If you can’t, attempt to tame your writer with the liberal application of gifts they will appreciate. Here I have gathered a few ideas that range from classic to new-fangled. Each is sure to melt almost any writer’s dark, cold heart.</p> <p>[amazon_link id="0547055323" target="_blank" ]<img src="http://ecx.images-amazon.com/images/I/417-fWjfIYL._SL160_.jpg" alt="The Best American Short Stories 2010 (The Best American Series (R))" />[/amazon_link] What writer doesn’t want to have their work end up in a “best” collection? Yes, even the ones who scoff at commercialism and popular opinion do. The Best American Short Stories series has been running for a good long time but the collections are kept fresh with yearly guest editors. My favorite feature of these books are the writer’s notes — insightful comments from the writers themselves about their stories. Though the notes<br /> were more essential in the days before the Internet provided instantaneous transmission of a writer’s every 140-character-long thought, they still provide a worthwhile virtual mini-panel on the state of the art in short fiction. If your writer leans more toward genre fiction, check out other collections such as the venerable [amazon_link id="0312608977" target="_blank" ] Year’s Best Science Fiction[/amazon_link] .</p> <p>[amazon_link id="B002FQJT3Q" target="_blank" ]<img src="http://ecx.images-amazon.com/images/I/417XQ0XwQuL._SL160_.jpg" alt="Kindle 3G Wireless Reading Device, Free 3G + Wi-Fi, 6" Display, Graphite, 3G Works Globally - Latest Generation" />[/amazon_link] Digital publishing is an unstoppable freight train charging towards us. Get on board or get squashed to jelly beneath the steel wheels of progress. The Kindle is probably the cheapest and most hassle-free way to hop on board and see what all the fuss is about. The screen is a pleasure to read in bright light, and the smaller size Kindle adds nothing to the weight or bulk of your messenger bag. There are a few quirks. You can’t read in bad light (just like a book) — the screen does not produce any light on it’s own. The smaller Kindle is really too small to effectively display PDFs. If your writer spends a majority of their time squinting at screenplay PDFs on their laptop, then go for the [amazon_link id="B002GYWHSQ" target="_blank" ]Kindle DX[/amazon_link] .</p> <p>[amazon_link id="B00138CX30" target="_blank" ]<img src="http://ecx.images-amazon.com/images/I/51lwiSZcWcL._SL160_.jpg" alt="Time Out Of Mind" />[/amazon_link] Music to write by. Normally, writing music should be instrumental only — can’t have too many words competing for brainspace. But there is something very atmospheric about the tracks on Bob Dylan’s [amazon_link id="B00138CX30" target="_blank" ] Time Out of Mind[/amazon_link] and the vocals are subdued if plaintive. The track “Not Dark Yet” was featured in the ultimate writer movie, [amazon_link id="B001ZS5CD6" target="_blank" ]Wonder Boys[/amazon_link] If your writer’s work is filled with angst and wounded characters, this is the record to spin. For a more cinematic, epic, sweeping soundtrack, try [amazon_link id="B003DUA56S" target="_blank" ]Clash of the Titans[/amazon_link] . Terrible movie, good generic action-movie score.</p> <p>[amazon_link id="8883701127" target="_blank" ]<img src="http://ecx.images-amazon.com/images/I/41nEt%2BZMYsL._SL160_.jpg" alt="Moleskine Ruled Notebook Large" />[/amazon_link] Unassuming and built to last — we may live in the digital age, but when paper is called for the Moleskine notebook provides hundreds of durably bound, creamy off-white pages just asking to be filled with Great Ideas. The spine can be bent clear back on itself without breaking the binding or loosening a single page. The covers are stiff but flexible cardboard and covered with water-resistant oilcloth. Not flashy, but definitely classy. There’s a reason these things are almost a writer’s cliche.</p> <p>[amazon_link id="B000GTR2F6" target="_blank" ]<img src="http://ecx.images-amazon.com/images/I/41GSZCYWKYL._SL160_.jpg" alt="Keurig B-70 B70 Platinum Single-Cup Home Brewing System" />[/amazon_link] Most writers juggle multiple hats. Writing often has to wait to the very end of the day. That, or it has to come first thing in the morning, long before any reasonable person would think of getting out of bed. It’s no wonder many writers, especially those first starting out, require a constant energy boost. I keep the [amazon_link id="B000GTR2F6" target="_blank" ]Keurig Platinum [/amazon_link] on my desk. It’s an essential piece of equipment, right alongside my printer and external hard drive. Making a fresh cup of coffee takes about one minute. Thanks to the self-contained filter cups, there is no mess to clean up. At all. Also good for tea, iced coffee, and provides instant hot water for hot cocoa or soup mix.</p> <p>[amazon_link id="B001TIJWKQ" target="_blank" ]<img src="http://ecx.images-amazon.com/images/I/41nsx4792FL._SL160_.jpg" alt="Crumpler The Considerable Embarrassment Laptop Bag, Brown/Red" />[/amazon_link] And where is your writer supposed to put all the new awesome stuff you buy them? A briefcase is far too corporate and a backpack is far too Kindergarten. A writer has one foot in both these worlds and so does Crumpler. Witness the honestly named [amazon_link id="B001TIJWKQ" target="_blank" ]Considerable Embarrassment Laptop Bag[/amazon_link] . This bag holds a 15″ laptop with room left over for books, magazines, moleskins and other writerly essentials. Or step up to the even roomier [amazon_link id="B001TN6S8K" target="_blank" ]Dreadful Embarassment[/amazon_link] , meant<br /> for 17″ laptops (always buy one size larger bag than you think you need). I’ve had a Crumpler camera bag for a couple of years and it’s proven to be tough, well-built and rock-solid.</p> <p>Of course, if you really love your writer you might consider gifting them with a Crumpler bag stuffed with <em>all</em> the items mentioned here. Your writer may be confused by this strange new emotion called <em>joy</em>, but it would only be a matter of time before the Great American Novel/Screenplay/Haiku is produced.</p> <p>Of course, if you really love your writer you might consider gifting them with a Crumpler bag stuffed with <em>all</em> the items mentioned here. Your writer may be confused by this strange new emotion called <em>joy</em>, but it would only be a matter of time before the Great American Novel/Screenplay/Haiku is produced.</p>
Attachments (8)
Change History (47)
#1
@
14 years ago
- Summary changed from shortcode_autop returns emtpy string to shortcode_unautop returns emtpy string
#2
@
14 years ago
Also it may be useful to know that in the above sample I am using the WordPress-Amazon-Associate plugin. If i remove the filter shortcode_unautop
all ShortCode is rendered correctly, and if not the empty string is returned for content.
#4
@
14 years ago
- Cc westi added
- Milestone changed from Awaiting Review to 3.1
- Severity changed from normal to major
This should be investigated.
We shouldn't be eating all the content like that.
#5
@
14 years ago
- Keywords reporter-feedback added
Cannot duplicate. Installed the WordPress Amazon Associate plugin and configured it. Pasted exactly what you said you had into the HTML editor and published. Post shows up fine, with Amazon images.
#6
follow-up:
↓ 8
@
14 years ago
I can reproduce this on a site with 90 shortcodes installed a large post with 62 captioned images (using the caption shortcode) with interstitial HTML between each image.
In fact, if I remove all but the caption shortcode from shortcode_unautop()
, I can still reproduce.
I get PREG_BACKTRACK_LIMIT_ERROR.
Making preg operator matching the shortcode's content possesive solves this for my test case. Patch attached.
#7
@
14 years ago
- Keywords needs-testing has-patch needs-unit-tests added
I don't see any tests for shortcode_unautop()
, and my testing was pretty limited.
#8
in reply to:
↑ 6
@
14 years ago
Replying to mdawaffe:
I can reproduce this
I can reproduce the bug, but apparently I can't write. Please excuse the typos and horrible grammar :)
#9
@
14 years ago
- Milestone changed from 3.1 to Future Release
- Priority changed from normal to high
Nice work mdawaffe! But I think we're going to have to punt this to 3.2. It's not new, it's late in the cycle, and we need time to make a test for this. It would be helpful if we had an example post and ini_set() combo (for setting the backtrack limit to an agreed minimum supported setting). mdawaffe — can you ini_get() your backtrack limit and provide a Lipsum version of the problem post?
#10
@
14 years ago
I've been able to reproduce the same error if a shortcode is the first non-whitespace on a new line non-shortcode content on the same line. In this case, if the post exceeds the backtrack limit, the result is NULL due to the PREG_BACKTRACK_LIMIT_ERROR. Example:
[foo]shortcode content[/foo] the rest of the paragraph. //40,000+ characters of lorem ipsum text.
#11
@
13 years ago
- Keywords changed from shortcode_autop, ShortCode reporter-feedback needs-testing has-patch needs-unit-tests to shortcode_autop ShortCode reporter-feedback needs-testing has-patch needs-unit-tests
The problem with this setting is that it's not based on the length of the input, it's based on how the pattern is written against the input. Having used it for shortcode-type parsing, there is no one-true-answer for this problem.
You can chunk the content, but you might split a tag pair over two chunks.
The only solution I found was to parse the string as a stream of text using strpos/substring.
#13
@
13 years ago
Related: #8553
The site where I first ran into the problem was using a lot of shortcodes with no content and no parameters, so on the longer posts I just prevented shortcodes from processing and processed them with str_replace instead. Obviously not something that works everywhere, but I've been able to use the same basic approach on several sites now.
#14
@
13 years ago
This bug is serious. It should be renamed "wordpress randomly gobbles your post if you use lots of shortcodes" or "shortcodes are useless if you need more than a few in a post".
I ran into it on a relatively short post (6600 bytes) that contained 16 shortcodes where the start of the post content was a shortcode expression.
#15
@
13 years ago
Although not a complete solution, it would be far better to return the original unfiltered text from shortcode_unautop if preg_replace errors out, instead of the empty string.
#16
@
13 years ago
Changing shortcode_unautop to the following fixes the empty post problem, although some unwanted <p> tags may be left in that case.
function shortcode_unautop($pee) { global $shortcode_tags; $pee_orig = $pee; if ( !empty($shortcode_tags) && is_array($shortcode_tags) ) { $tagnames = array_keys($shortcode_tags); $tagregexp = join( '|', array_map('preg_quote', $tagnames) ); $pee = preg_replace('/<p>\\s*?(\\[(' . $tagregexp . ')\\b.*?\\/?\\](?:.+?\\[\\/\\2\\])?)\\s*<\\/p>/s', '$1', $pee); if ($pee == NULL) return $pee_orig; } return $pee; }
#17
@
13 years ago
- Cc mdawaffe added
Here's a much better patch than my original.
Both do_shortcode()
and shortcode_unautop()
can run into PREG_BACKTRACK_LIMIT_ERRORs for large inputs.
Attached drastically reduces the execution time and backtracing for both functions.
- Constant prefix in
get_shortcode_regex()
. - Unroll the Loop to more efficiently grab the shortcode attributes.
- Only look for a closing shortcode tag when the shortcode is not self closing.
- Comments for the regular expression.
We could also similarly "Unroll the Loop" for the content between the opening and closing shortcode tags, but I *did not test* this patch against shortcodes with inner content, so I don't know if the patch even works, let alone if unrolling the loop there would help.
I've also attached a some test scripts for the diff.
Untar the tests into WordPress' ABSPATH, then run: ./shortcodes-tests.sh 15600.2.diff
WARNING: the test script reverts your checkout!
My results for the tests:
$ ./shortcodes-tests.sh 15600.2.diff Reverted 'wp-includes/formatting.php' Reverted 'wp-includes/shortcodes.php' shortcode_unautop() Tests: original SHORT TIME: 0.0088322162628174 LONG TIME: 1.3537223339081 SHORT BACKTRACE REQUIREMENT: 232 LONG BACKTRACE REQUIREMENT: 135670 do_shortcode() Tests: original SHORT TIME: 0.027733564376831 LONG TIME: 2.1369636058807 SHORT BACKTRACE REQUIREMENT: 420 LONG BACKTRACE REQUIREMENT: 135858 patching file wp-includes/shortcodes.php patching file wp-includes/formatting.php shortcode_unautop() Tests: patched SHORT TIME: 0.0067722797393799 LONG TIME: 0.13501048088074 SHORT BACKTRACE REQUIREMENT: 15 LONG BACKTRACE REQUIREMENT: 15 do_shortcode() Tests: patched SHORT TIME: 0.021802425384521 LONG TIME: 0.14546251296997 SHORT BACKTRACE REQUIREMENT: 16 LONG BACKTRACE REQUIREMENT: 16 DIFFS:
The lack of any content after the "DIFFS:" line is good: it means there's no change in the outputs of the functions.
Notice that, with respect to the input length, the time required scales much better, and the amount of backtracing is constant.
My tests are pretty limited, though: just a couple simple cases embedded in content of varying length.
#18
@
13 years ago
Nice patch.
But it increases the number of failures in the shortcode tests here: http://unit-tests.trac.wordpress.org/browser/wp-testcase/test_shortcode.php
Before: Tests: 25, Assertions: 49, Failures: 2, Skipped: 4.
After: Tests: 25, Assertions: 38, Failures: 12, Skipped: 4.
Attaching output here too.
#20
@
13 years ago
Thanks for pointing me at the test cases.
15600.3.diff now passes the same tests as r17834.
I tested with
php wp-test.php -t WPTestShortcodeOutput1,WPTestShortcodeOutputParagraph,TestShortcode,TestShortcodeStripping
Still a significant speedup.
$ ./shortcodes-tests.sh 15600.3.diff Reverted 'wp-includes/formatting.php' Reverted 'wp-includes/shortcodes.php' shortcode_unautop() Tests: original SHORT TIME: 0.0092761516571045 LONG TIME: 1.3933248519897 SHORT BACKTRACE REQUIREMENT: 232 LONG BACKTRACE REQUIREMENT: 135670 do_shortcode() Tests: original SHORT TIME: 0.029265403747559 LONG TIME: 2.1743712425232 SHORT BACKTRACE REQUIREMENT: 420 LONG BACKTRACE REQUIREMENT: 135858 patching file wp-includes/shortcodes.php patching file wp-includes/formatting.php shortcode_unautop() Tests: patched SHORT TIME: 0.0078203678131104 LONG TIME: 0.38498711585999 SHORT BACKTRACE REQUIREMENT: 42 LONG BACKTRACE REQUIREMENT: 46 do_shortcode() Tests: patched SHORT TIME: 0.024780035018921 LONG TIME: 0.39810061454773 SHORT BACKTRACE REQUIREMENT: 47 LONG BACKTRACE REQUIREMENT: 51 DIFFS:
#21
@
13 years ago
- Milestone changed from Future Release to 3.2
This should be a 3.2 candidate. Nice work mdawaffe.
#23
@
13 years ago
- Keywords 3.3-early westi-likes added
- Milestone changed from 3.2 to Future Release
It is too late to go into 3.2 now.
Marking for 3.3
#24
@
13 years ago
If it's too late for the more complete solution, could you take the patch in comment 16 for 3.2? It is low risk. Essentially it just checks for a NULL return result and returns the original post content rather than erasing it.
This problem requires me to hand patch Wordpress to keep my posts from being randomly replaced with empty content, so it's really a nuisance.
#25
@
13 years ago
This bug has gone silent but it's still a serious problem with two patches available to be applied. Is it going to be fixed in 3.3?
#26
@
13 years ago
- Keywords 3.3-early westi-likes removed
- Milestone changed from Future Release to 3.3
#27
@
13 years ago
Does this bug deserve the needs-testing and needs-unit-test keywords since shortcodes have unit tests?
#28
follow-up:
↓ 29
@
13 years ago
mdawaffe, westi, or ryan, are you comfortable with 15600.3.diff on WP.com?
#29
in reply to:
↑ 28
;
follow-up:
↓ 34
@
13 years ago
Replying to nacin:
mdawaffe, westi, or ryan, are you comfortable with 15600.3.diff on WP.com?
It's not running on WP.com yet. I thought it was. I'm looking for my test cases (so far I've completely lost them), and I'll hook it up.
#30
follow-up:
↓ 31
@
13 years ago
So far, no problems on WP.com.
The patch changes the match grouping, and some plugins are using the function: http://www.google.com/search?gcx=c&ix=c1&sourceid=chrome&ie=UTF-8&q=get_shortcode_regex+site%3Aplugins.svn.wordpress.org
Could someone grep through the plugins repo and see what files are affected? My guess is that few if any of the plugins will actually be affected.
#31
in reply to:
↑ 30
@
13 years ago
Replying to mdawaffe:
Could someone grep through the plugins repo and see what files are affected? My guess is that few if any of the plugins will actually be affected.
http://pastie.org/pastes/2522893/text
I quickly split it into two parts. The bottom shows plugins that seem to re-implement the function themselves or include it via backpress.
#32
@
13 years ago
OK - I lied :) There were a few issues on WP.com with back compat. Also, a bunch of the plugins in http://pastie.org/pastes/2522893/text (thanks, duck_) use the matching groups in ways I didn't expect.
So here's a new patch.
It's the same as 15600.3.diff with the following differences:
- All matching groups are the same as in the current code.
- The regex no longer requires
/x
to compile.
I believe the patch is now 100% backward compatible.
#33
@
13 years ago
I'm one of the ones calling get_shortcode_regex()
because my plugin needs to use do_shortcode()
with a different callback function (one that keeps double brackets). If only do_shortcode()
accepted a callback function parameter... ;)
#34
in reply to:
↑ 29
;
follow-up:
↓ 35
@
13 years ago
Replying to mdawaffe:
Replying to nacin:
mdawaffe, westi, or ryan, are you comfortable with 15600.3.diff on WP.com?
It's not running on WP.com yet. I thought it was. I'm looking for my test cases (so far I've completely lost them), and I'll hook it up.
You attached your tests to this ticket I believe.
#35
in reply to:
↑ 34
;
follow-up:
↓ 36
@
13 years ago
Replying to westi:
You attached your tests to this ticket I believe.
Yeah - I reran those tests. I used to have some real-world data as well for which the current code failed.
15600.5.diff has been running on WP.com since I uploaded the patch here with no (known to me) problems.
#36
in reply to:
↑ 35
@
13 years ago
Replying to mdawaffe:
15600.5.diff has been running on WP.com since I uploaded the patch here with no (known to me) problems.
We found a problem :)
15600.6.diff is a more forgiving regex - it's more generous in what can match as the shortcode attributes. It reproduces the behavior of the current code, in that regard: allow shortcode_parse_atts()
to do the attribute parsing.
It's very slightly slower than 15600.5.diff, but has better backtracing performance. It's still much faster than the current code.
$ ./shortcodes-tests.sh ~/15600.6.diff Reverted 'wp-includes/formatting.php' Reverted 'wp-includes/shortcodes.php' shortcode_unautop() Tests: original SHORT TIME: 0.0095028877258301 LONG TIME: 3.7883765697479 SHORT BACKTRACE REQUIREMENT: 232 LONG BACKTRACE REQUIREMENT: 135670 do_shortcode() Tests: original SHORT TIME: 0.029194116592407 LONG TIME: 19.19673538208 SHORT BACKTRACE REQUIREMENT: 420 LONG BACKTRACE REQUIREMENT: 135858 patching file wp-includes/shortcodes.php patching file wp-includes/formatting.php shortcode_unautop() Tests: patched SHORT TIME: 0.0085659027099609 LONG TIME: 0.39086127281189 SHORT BACKTRACE REQUIREMENT: 33 LONG BACKTRACE REQUIREMENT: 37 do_shortcode() Tests: patched SHORT TIME: 0.026137113571167 LONG TIME: 0.41313719749451 SHORT BACKTRACE REQUIREMENT: 42 LONG BACKTRACE REQUIREMENT: 46 DIFFS:
Should have copy pasted the name, i mean the
shortcode_unautop
function.