Opened 14 years ago
Closed 13 years ago
#16892 closed defect (bug) (fixed)
make_clickable segfault
Reported by: | westi | Owned by: | duck_ |
---|---|---|---|
Milestone: | 3.4 | Priority: | normal |
Severity: | normal | Version: | 3.1 |
Component: | Formatting | Keywords: | has-patch needs-testing 3.4-early westi-likes |
Focuses: | Cc: |
Description
Running the following nasty comment text through make_clickable segfaults:
<?php require dirname(__FILE__) . '/wp-load.php'; $comment= "http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/"; make_clickable($comment);
gdb segfault details:
Program received signal SIGSEGV, Segmentation fault. match ( eptr=0x2aaab052996c "le/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/pos"..., ecode=0x10c6971 "^", mstart=0x2aaab05282b1 "http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-title/http://example.com/2011/03/18/post-t"..., markptr=0x0, offset_top=2, md=0x7fffffffad70, ims=5, eptrb=0x0, flags=0, rdepth=11626) at /usr/local/src/php-5.3.5/ext/pcre/pcrelib/pcre_exec.c:1221 1221 /usr/local/src/php-5.3.5/ext/pcre/pcrelib/pcre_exec.c: No such file or directory. in /usr/local/src/php-5.3.5/ext/pcre/pcrelib/pcre_exec.c
Attachments (9)
Change History (44)
#4
@
14 years ago
- Keywords needs-patch added; reporter-feedback removed
Yes I do have PCRE enabled :)
WordPress wouldn't work very well otherwise
#5
follow-up:
↓ 7
@
14 years ago
I asked because of this message:
/usr/local/src/php-5.3.5/ext/pcre/pcrelib/pcre_exec.c: No such file or directory.
Looks like your PHP version is missing the PCRE library.
As it does not segfault for me with the same PHP version of which mine is officially distributed by PHP.net I assume that you're facing a configuration issue on your server.
I tend to close this ticket as worksforme.
Pleae ensure that you've properly compiled PHP with all needed libraries, especially with the ones needed for the PCRE extension.
Does the following script segfaults on that said system?
<?php // The "i" after the pattern delimiter indicates a case-insensitive search if (preg_match("/php/i", "PHP is the web scripting language of choice.")) { echo "A match was found."; } else { echo "A match was not found."; } ?>
#7
in reply to:
↑ 5
;
follow-up:
↓ 8
@
14 years ago
- Keywords needs-patch added; reporter-feedback removed
Replying to hakre:
I asked because of this message:
/usr/local/src/php-5.3.5/ext/pcre/pcrelib/pcre_exec.c: No such file or directory.Looks like your PHP version is missing the PCRE library.
Nope - that is just GDB mentioning that I don't have the source code on this box.
As it does not segfault for me with the same PHP version of which mine is officially distributed by PHP.net I assume that you're facing a configuration issue on your server.
I tend to close this ticket as worksforme.
Pleae ensure that you've properly compiled PHP with all needed libraries, especially with the ones needed for the PCRE extension.
Does the following script segfaults on that said system?
<?php // The "i" after the pattern delimiter indicates a case-insensitive search if (preg_match("/php/i", "PHP is the web scripting language of choice.")) { echo "A match was found."; } else { echo "A match was not found."; } ?>
No segfault - works fine and finds a match.
The segfault only occurs when you have enough of the urls in the comment for it to trigger - reduce the number of urls and the segfault goes away.
#8
in reply to:
↑ 7
@
14 years ago
Replying to westi:
The segfault only occurs when you have enough of the urls in the comment for it to trigger - reduce the number of urls and the segfault goes away.
Then you're triggering some boundary on your system most probably. PHP should do an error instead I think.
I don't have this problem with your script, when I make_clickable($comment.$comment);
(double the comment), I get a internal server error.
Looks like that this is PHP error (triggered by an error in libpcre).
Have you reported this upstream for PHP pcre / libpcre?
#9
@
14 years ago
As I can now reproduce the problem, I could further drill it down to the following line:
preg_replace_callback('#(?<!=[\'"])(?<=[*\')+.,;:!&$\s>])(\()?([\w]+?://(?:[\w\\x80-\\xff\#%~/?@\[\]-]|[\'*(+.,;:!=&$](?![\b\)]|(\))?([\s]|$))|(?(1)\)(?![\s<.,;:]|$)|\)))+)#is', '_make_url_clickable_cb', $ret);
#10
@
14 years ago
I did modify the testscript now. This version enables a tester to modify the number of URL repetitions to deal with.
On my testbed, I have this properly running until 255 repetitions. With 256 (as uploaded), the script dies.
#11
@
14 years ago
I think you've found a bug here, but I smell it not being a WordPress bug because it's just using the preg_replace_callback() as documented.
So instead of segfaulting, PHP should give some error.
But it is segfaulting, I assume that this is related to the pcre library code, so this should be reported upstream.
Obviously this is triggering an edge case. It contains a link that will be larger than 10 000 characters, which would not violate HTTP but is very uncommon (see #10483).
There is a usefull limit of 255 characters for URLs, the next usefull limit is 2000 characters.
I could extend from 255 to 2 090 (!) by the changes in the patch. This allows me to pass a string of about 85 690 chars into make_clickable()
. 2 091 will again let the script die.
It does not make much of a difference btw to change to {1,255}
, so I opted for the longer limit.
Spaces between URLs will prevent the dying. This is really an edge-case.
#12
@
14 years ago
The first patch contained a second change that was not necessary for the improvement. Additionally it might introduce unwanted constraints, so prefer the second patch.
Please test if this solves your issue.
@
14 years ago
Workaround to prevent this quickly: http://hakre.wordpress.com/2011/03/19/this-page-is-a-test/
#14
@
14 years ago
I found out that you can prevent PHP from segfaulting by reducing the PCRE's recursion limit, e.g. ini_set('pcre.recursion_limit', 1000);
(default is 100 000, for me the highest non-crashing is 20 903 with the patch applied and an URL repetition count of 2 091). This will prevent PHP from segfaulting and make make_clickable()
return an empty string.
PCRE's recursion limit. Please note that if you set this value to a high number you may consume all the available process stack and eventually crash PHP (due to reaching the stack size limit imposed by the Operating System).
from: PCRE Configuration Options
The provided patch just reduces the limit to 10 000 (a tenth of the default value), executes the regex. If it fails, it falls back to the original string, on success will take over the one with the replacements.
#15
@
14 years ago
The latest patch did extend the amount of text that is made clickable by improving the regex as previously.
It's an increase from 122 repetitions / 5 002 character long string up to 999 repetitions / 40 959 characters.
Anyway, the callback should probably not create links from URLs that exceed a certain size (e.g. 2000 characters).
#17
@
14 years ago
16892.5.patch has been soaking on wp.com for a couple days. Looking good so far. We probably need to @ silence ini_set() like we do everywhere else.
#21
follow-up:
↓ 25
@
14 years ago
- Keywords needs-testing added
- Milestone changed from 3.1.1 to Awaiting Review
- Resolution fixed deleted
- Status changed from closed to reopened
With the current code, an input of sufficient length can still cause segfaults. Increasing the length required before an input will cause a segfault doesn't solve the problem.
Attached:
- Fixes the ~2000 character limit for auto linking URLs.
- Improves the efficiency of the regex.
- Simplifies the regex.
- Passes all tests in
php wp-test.php -t TestMakeClickable
- Is 10-20% faster on "typical" inputs with no or sparse links.
- Is 10-20% faster on "atypical" inputs with very dense links.
- Is ~80% faster on malicious inputs that the current code can handle.
- Does not segfault on malicious inputs on which the current code segfaults.
The ~2000 character limit is imposed in two parts. First, if the input is larger than 10000 characters (~1500 english words), the input is broken up into chunks by splitting it at whitespace characters. Chunks that can't be split (i.e. chunks with no whitespace) are skipped so that the regex doesn't have to process them. Second, the URL's total length is limited by the regex.
I'm sure there are some edge cases that this patch treats differently than the current code. I'm not sure if those edge case come up "naturally". If they do, I'm not sure if they have well defined expected behaviors.
Needs testing, especially with non-ASCII and multibyte inputs.
#22
@
13 years ago
- Keywords 3.4-early westi-likes added; reporter-feedback removed
Adding to list for 3.4 consideration.
#23
@
13 years ago
- Owner set to duck_
- Status changed from reopened to reviewing
@duck_ could you take a look at this and see if it is suitable for 3.4 inclusion.
#25
in reply to:
↑ 21
;
follow-up:
↓ 26
@
13 years ago
Replying to mdawaffe:
I'm sure there are some edge cases that this patch treats differently than the current code. I'm not sure if those edge case come up "naturally". If they do, I'm not sure if they have well defined expected behaviors.
A URL following punctuation without a space (likely due to a typo) is one situation that the current function allows for, but the patch does not.
Comma, no space, URL,http://example.com/.
We might want to add some more punctuation to the character class in the first subgroup to accommodate for this.
#26
in reply to:
↑ 25
@
13 years ago
Replying to duck_:
Replying to mdawaffe:
I'm sure there are some edge cases that this patch treats differently than the current code. I'm not sure if those edge case come up "naturally". If they do, I'm not sure if they have well defined expected behaviors.
A URL following punctuation without a space (likely due to a typo) is one situation that the current function allows for, but the patch does not.
Comma, no space, URL,http://example.com/.We might want to add some more punctuation to the character class in the first subgroup to accommodate for this.
Sounds like a good idea.
I think the lack of space in this test case is probably a quite common occurrence in comments typed in haste.
#29
follow-up:
↓ 30
@
13 years ago
if ( ')' == $matches[3] && strpos( $url, '(' ) ) {
in _make_url_clickable_cb()
looks wrong, as a "(" as the first character (i.e. position 0) would be returned as falsy.
If this is by design, strpos( $url, '(' ) > 0
would be clearer, or if we want to recognize "(" as the first character, this needs to be false === strpos( $url, '(' )
.
#30
in reply to:
↑ 29
@
13 years ago
Replying to TobiasBg:
as a "(" as the first character (i.e. position 0) would be returned as falsy. If this is by design,
strpos( $url, '(' ) > 0
would be clearer, or if we want to recognize "(" as the first character, this needs to befalse === strpos( $url, '(' )
.
Since "(" cannot be the first character of $url
I don't think this matters. However, I'm not against false !== strpos( $url, '(' )
to be explicit.
#31
follow-up:
↓ 32
@
13 years ago
Any objection to including a closing bracket >
in the the "# 1: Leading whitespace, or punctuation" check at the beginning of the lookup?
This change breaks themes like P2 that filter post content late, after wpautop runs, so it needs to match strings like <p>http://wordpress.org/</p>
.
Patch adds it in, and also aligns the regex comments.
#32
in reply to:
↑ 31
;
follow-up:
↓ 33
@
13 years ago
- Resolution fixed deleted
- Status changed from closed to reopened
Replying to lancewillett:
Any objection to including a closing bracket
>
in the the "# 1: Leading whitespace, or punctuation" check at the beginning of the lookup?
Before [19899] it was in the positive lookbehind, so sounds sensible to re-add it.
#33
in reply to:
↑ 32
@
13 years ago
Replying to duck_:
Replying to lancewillett:
Any objection to including a closing bracket
>
in the the "# 1: Leading whitespace, or punctuation" check at the beginning of the lookup?
Before [19899] it was in the positive lookbehind, so sounds sensible to re-add it.
Agreed.
We should add some tests for bare links inside html too.
Please check if you have the required php module available, e.g. pcre.