Make WordPress Core

Opened 13 years ago

Closed 12 years ago

#16892 closed defect (bug) (fixed)

make_clickable segfault

Reported by: westi's profile westi Owned by: duck_'s profile duck_
Milestone: 3.4 Priority: normal
Severity: normal Version: 3.1
Component: Formatting Keywords: has-patch needs-testing 3.4-early westi-likes
Focuses: Cc:

Description

Running the following nasty comment text through make_clickable segfaults:

<?php
require dirname(__FILE__) . '/wp-load.php';

$comment= "http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/";

make_clickable($comment);

gdb segfault details:

Program received signal SIGSEGV, Segmentation fault.
match (
    eptr=0x2aaab052996c "le/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/pos"..., ecode=0x10c6971 "^", 
    mstart=0x2aaab05282b1 "http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-t"..., markptr=0x0, offset_top=2, md=0x7fffffffad70, 
    ims=5, eptrb=0x0, flags=0, rdepth=11626) at /usr/local/src/php-5.3.5/ext/pcre/pcrelib/pcre_exec.c:1221
1221	/usr/local/src/php-5.3.5/ext/pcre/pcrelib/pcre_exec.c: No such file or directory.
	in /usr/local/src/php-5.3.5/ext/pcre/pcrelib/pcre_exec.c

Attachments (9)

test-segfault.php (362 bytes) - added by hakre 13 years ago.
Testscript
16892.patch (1.2 KB) - added by hakre 13 years ago.
Help the Regex Compiler
16892.2.patch (1.2 KB) - added by hakre 13 years ago.
Original Patch
16892.3.patch (638 bytes) - added by hakre 13 years ago.
Workaround to prevent this quickly: http://hakre.wordpress.com/2011/03/19/this-page-is-a-test/
16892.4.patch (1.3 KB) - added by hakre 13 years ago.
pcre.recursion_limit settings workaround
16892.5.patch (1.3 KB) - added by hakre 13 years ago.
Improved Regex plus pcre.recursion_limit setting tweak
16892.6.patch (6.0 KB) - added by mdawaffe 13 years ago.
16892.7.patch (5.9 KB) - added by duck_ 12 years ago.
16892.8.patch (1.8 KB) - added by lancewillett 12 years ago.

Download all attachments as: .zip

Change History (44)

#1 @hakre
13 years ago

Please check if you have the required php module available, e.g. pcre.

#2 @hakre
13 years ago

FYI: I've tested your snippet on php 5.3.5 an this works w/o any segfault for me.

#3 @hakre
13 years ago

  • Keywords reporter-feedback added; needs-patch removed

#4 @westi
13 years ago

  • Keywords needs-patch added; reporter-feedback removed

Yes I do have PCRE enabled :)

WordPress wouldn't work very well otherwise

#5 follow-up: @hakre
13 years ago

I asked because of this message:

/usr/local/src/php-5.3.5/ext/pcre/pcrelib/pcre_exec.c: No such file or directory.

Looks like your PHP version is missing the PCRE library.

As it does not segfault for me with the same PHP version of which mine is officially distributed by PHP.net I assume that you're facing a configuration issue on your server.

I tend to close this ticket as worksforme.

Pleae ensure that you've properly compiled PHP with all needed libraries, especially with the ones needed for the PCRE extension.

Does the following script segfaults on that said system?

<?php
// The "i" after the pattern delimiter indicates a case-insensitive search
if (preg_match("/php/i", "PHP is the web scripting language of choice.")) {
    echo "A match was found.";
} else {
    echo "A match was not found.";
}
?>

#6 @hakre
13 years ago

  • Keywords reporter-feedback added; needs-patch removed

#7 in reply to: ↑ 5 ; follow-up: @westi
13 years ago

  • Keywords needs-patch added; reporter-feedback removed

Replying to hakre:

I asked because of this message:

/usr/local/src/php-5.3.5/ext/pcre/pcrelib/pcre_exec.c: No such file or directory.

Looks like your PHP version is missing the PCRE library.

Nope - that is just GDB mentioning that I don't have the source code on this box.

As it does not segfault for me with the same PHP version of which mine is officially distributed by PHP.net I assume that you're facing a configuration issue on your server.

I tend to close this ticket as worksforme.

Pleae ensure that you've properly compiled PHP with all needed libraries, especially with the ones needed for the PCRE extension.

Does the following script segfaults on that said system?

<?php
// The "i" after the pattern delimiter indicates a case-insensitive search
if (preg_match("/php/i", "PHP is the web scripting language of choice.")) {
    echo "A match was found.";
} else {
    echo "A match was not found.";
}
?>

No segfault - works fine and finds a match.

The segfault only occurs when you have enough of the urls in the comment for it to trigger - reduce the number of urls and the segfault goes away.

#8 in reply to: ↑ 7 @hakre
13 years ago

Replying to westi:

The segfault only occurs when you have enough of the urls in the comment for it to trigger - reduce the number of urls and the segfault goes away.

Then you're triggering some boundary on your system most probably. PHP should do an error instead I think.

I don't have this problem with your script, when I make_clickable($comment.$comment); (double the comment), I get a internal server error.

Looks like that this is PHP error (triggered by an error in libpcre).

Have you reported this upstream for PHP pcre / libpcre?

#9 @hakre
13 years ago

As I can not reproduce the problem, I could further drill it down to the following line:

preg_replace_callback('#(?<!=[\'"])(?<=[*\')+.,;:!&$\s>])(\()?([\w]+?://(?:[\w\\x80-\\xff\#%~/?@\[\]-]|[\'*(+.,;:!=&$](?![\b\)]|(\))?([\s]|$))|(?(1)\)(?![\s<.,;:]|$)|\)))+)#is', '_make_url_clickable_cb', $ret);

Version 0, edited 13 years ago by hakre (next)

@hakre
13 years ago

Testscript

#10 @hakre
13 years ago

I did modify the testscript now. This version enables a tester to modify the number of URL repetitions to deal with.

On my testbed, I have this properly running until 255 repetitions. With 256 (as uploaded), the script dies.

@hakre
13 years ago

Help the Regex Compiler

#11 @hakre
13 years ago

I think you've found a bug here, but I smell it not being a WordPress bug because it's just using the preg_replace_callback() as documented.

So instead of segfaulting, PHP should give some error.

But it is segfaulting, I assume that this is related to the pcre library code, so this should be reported upstream.

Obviously this is triggering an edge case. It contains a link that will be larger than 10 000 characters, which would not violate HTTP but is very uncommon (see #10483).

There is a usefull limit of 255 characters for URLs, the next usefull limit is 2000 characters.

I could extend from 255 to 2 090 (!) by the changes in the patch. This allows me to pass a string of about 85 690 chars into make_clickable(). 2 091 will again let the script die.

It does not make much of a difference btw to change to {1,255}, so I opted for the longer limit.

Spaces between URLs will prevent the dying. This is really an edge-case.

@hakre
13 years ago

Original Patch

#12 @hakre
13 years ago

The first patch contained a second change that was not necessary for the improvement. Additionally it might introduce unwanted constraints, so prefer the second patch.

Please test if this solves your issue.

#13 @hakre
13 years ago

  • Keywords has-patch reporter-feedback added; needs-patch removed

@hakre
13 years ago

pcre.recursion_limit settings workaround

#14 @hakre
13 years ago

I found out that you can prevent PHP from segfaulting by reducing the PCRE's recursion limit, e.g. ini_set('pcre.recursion_limit', 1000); (default is 100 000, for me the highest non-crashing is 20 903 with the patch applied and an URL repetition count of 2 091). This will prevent PHP from segfaulting and make make_clickable() return an empty string.

PCRE's recursion limit. Please note that if you set this value to a high number you may consume all the available process stack and eventually crash PHP (due to reaching the stack size limit imposed by the Operating System).

from: PCRE Configuration Options

The provided patch just reduces the limit to 10 000 (a tenth of the default value), executes the regex. If it fails, it falls back to the original string, on success will take over the one with the replacements.

@hakre
13 years ago

Improved Regex plus pcre.recursion_limit setting tweak

#15 @hakre
13 years ago

The latest patch did extend the amount of text that is made clickable by improving the regex as previously.

It's an increase from 122 repetitions / 5 002 character long string up to 999 repetitions / 40 959 characters.

Anyway, the callback should probably not create links from URLs that exceed a certain size (e.g. 2000 characters).

#16 @ryan
13 years ago

  • Milestone changed from Awaiting Review to 3.2

#17 @ryan
13 years ago

16892.5.patch has been soaking on wp.com for a couple days. Looking good so far. We probably need to @ silence ini_set() like we do everywhere else.

#18 @filosofo
13 years ago

16892.5.patch passes all 86 of my make_clickable tests.

#19 @ryan
13 years ago

  • Milestone changed from 3.2 to 3.1.1

#20 @ryan
13 years ago

  • Resolution set to fixed
  • Status changed from new to closed

[17570] for trunk and [17571] for 3.1.

#21 follow-up: @mdawaffe
13 years ago

  • Keywords needs-testing added
  • Milestone changed from 3.1.1 to Awaiting Review
  • Resolution fixed deleted
  • Status changed from closed to reopened

With the current code, an input of sufficient length can still cause segfaults. Increasing the length required before an input will cause a segfault doesn't solve the problem.

Attached:

  1. Fixes the ~2000 character limit for auto linking URLs.
  2. Improves the efficiency of the regex.
  3. Simplifies the regex.
  4. Passes all tests in php wp-test.php -t TestMakeClickable
  5. Is 10-20% faster on "typical" inputs with no or sparse links.
  6. Is 10-20% faster on "atypical" inputs with very dense links.
  7. Is ~80% faster on malicious inputs that the current code can handle.
  8. Does not segfault on malicious inputs on which the current code segfaults.

The ~2000 character limit is imposed in two parts. First, if the input is larger than 10000 characters (~1500 english words), the input is broken up into chunks by splitting it at whitespace characters. Chunks that can't be split (i.e. chunks with no whitespace) are skipped so that the regex doesn't have to process them. Second, the URL's total length is limited by the regex.

I'm sure there are some edge cases that this patch treats differently than the current code. I'm not sure if those edge case come up "naturally". If they do, I'm not sure if they have well defined expected behaviors.

Needs testing, especially with non-ASCII and multibyte inputs.

@mdawaffe
13 years ago

#22 @westi
12 years ago

  • Keywords 3.4-early westi-likes added; reporter-feedback removed

Adding to list for 3.4 consideration.

#23 @westi
12 years ago

  • Owner set to duck_
  • Status changed from reopened to reviewing

@duck_ could you take a look at this and see if it is suitable for 3.4 inclusion.

#24 @duck_
12 years ago

  • Milestone changed from Awaiting Review to 3.4

Looks good.

#25 in reply to: ↑ 21 ; follow-up: @duck_
12 years ago

Replying to mdawaffe:

I'm sure there are some edge cases that this patch treats differently than the current code. I'm not sure if those edge case come up "naturally". If they do, I'm not sure if they have well defined expected behaviors.

A URL following punctuation without a space (likely due to a typo) is one situation that the current function allows for, but the patch does not.

Comma, no space, URL,http://example.com/.

We might want to add some more punctuation to the character class in the first subgroup to accommodate for this.

#26 in reply to: ↑ 25 @westi
12 years ago

Replying to duck_:

Replying to mdawaffe:

I'm sure there are some edge cases that this patch treats differently than the current code. I'm not sure if those edge case come up "naturally". If they do, I'm not sure if they have well defined expected behaviors.

A URL following punctuation without a space (likely due to a typo) is one situation that the current function allows for, but the patch does not.

Comma, no space, URL,http://example.com/.

We might want to add some more punctuation to the character class in the first subgroup to accommodate for this.

Sounds like a good idea.

I think the lack of space in this test case is probably a quite common occurrence in comments typed in haste.

@duck_
12 years ago

#27 @duck_
12 years ago

  • Resolution set to fixed
  • Status changed from reviewing to closed

In [19899]:

Improve efficiency of make_clickable(). Props mdawaffe. Fixes #16892.

Not only does this improve general performance, but also helps to prevent
segfaults caused by malicious input to the regular expression. The regular
expression is also simplified to help readability and maintenance.

#28 @duck_
12 years ago

In [19900]:

Add @since and @access tag to _split_str_by_whitespace(). Props ocean90. See #16892.

#29 follow-up: @TobiasBg
12 years ago

  if ( ')' == $matches[3] && strpos( $url, '(' ) ) {

in _make_url_clickable_cb() looks wrong, as a "(" as the first character (i.e. position 0) would be returned as falsy.
If this is by design, strpos( $url, '(' ) > 0 would be clearer, or if we want to recognize "(" as the first character, this needs to be false === strpos( $url, '(' ).

#30 in reply to: ↑ 29 @duck_
12 years ago

Replying to TobiasBg:

as a "(" as the first character (i.e. position 0) would be returned as falsy. If this is by design, strpos( $url, '(' ) > 0 would be clearer, or if we want to recognize "(" as the first character, this needs to be false === strpos( $url, '(' ).

Since "(" cannot be the first character of $url I don't think this matters. However, I'm not against false !== strpos( $url, '(' ) to be explicit.

#31 follow-up: @lancewillett
12 years ago

Any objection to including a closing bracket > in the the "# 1: Leading whitespace, or punctuation" check at the beginning of the lookup?

This change breaks themes like P2 that filter post content late, after wpautop runs, so it needs to match strings like <p>http://wordpress.org/</p>.

Patch adds it in, and also aligns the regex comments.

#32 in reply to: ↑ 31 ; follow-up: @duck_
12 years ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

Replying to lancewillett:

Any objection to including a closing bracket > in the the "# 1: Leading whitespace, or punctuation" check at the beginning of the lookup?

Before [19899] it was in the positive lookbehind, so sounds sensible to re-add it.

#33 in reply to: ↑ 32 @westi
12 years ago

Replying to duck_:

Replying to lancewillett:

Any objection to including a closing bracket > in the the "# 1: Leading whitespace, or punctuation" check at the beginning of the lookup?

Before [19899] it was in the positive lookbehind, so sounds sensible to re-add it.

Agreed.

We should add some tests for bare links inside html too.

#34 @westi
12 years ago

Starter for one [UT546].

#35 @duck_
12 years ago

  • Resolution set to fixed
  • Status changed from reopened to closed

Links within HTML tags happen since [20443]. Any other problems should go on new tickets.

Note: See TracTickets for help on using tickets.