Opened 5 years ago
Last modified 2 weeks ago
#47514 new enhancement
Change priority of make_clickable callback to boost performance
Reported by: | olliverh87 | Owned by: | |
---|---|---|---|
Milestone: | Awaiting Review | Priority: | normal |
Severity: | minor | Version: | 5.2.1 |
Component: | Comments | Keywords: | |
Focuses: | performance | Cc: |
Description
Hi!
Preemptive: please excuse me if this ticket is inappropriate. Since WordPress' HackerOne states explicit disinterest in DoS related issues, I hesitated at first to create this ticket here. However, the effectiveness of the DoS attack and the signifcant performance boost I gained after I understood the logic behind the attack and changed a line of code made me think this might be worth sharing here.
Last week one of our clients who runs a very high profile site with WordPress fell victim to a sophisticated DoS attack where attackers spammed the site with comments containing thousands of strings such as ftp.z
. The processing of requests that displayed these comments took a huge portion of the server's available RAM.
Long story short, we identified the following callbacks as the troublemakers:
make_clickable
wp_kses_post
wptexturize
convert_chars
convert_smilies
force_balance_tags
wpautop
They all have in common that they operate on the entire comment string and attempt to parse out all HTML tags contained within them via REGEX functions, which takes up a lot of RAM. As a solution for our client, we established some more spam filtering for created comments.
However, out of personal interest in how this attack worked I tried to understand the logic behind the attack.
The reason the attackers had created comments containing thousands of strings such as ftp.x is that the make_clickable
callback converts these strings into full <a>
tags via PHP's built-in preg_replace_callback
.
Per default, comments can only contain 65500 characters. The string ftp.x
(including the whitespace) contains only 6 characters. Thus it is possible to have the string ftp.x
contained 10916 times within a single comment. make_clickable
then turns those 10916 strings into 10916 tags, leading to a comment containing over 500.000 characters. Since make_clickable
runs first, all of the other 6 callbacks that operate on the comment string use REGEX to parse 10916 * 2 (one opening and one closing) tags, leading to a huge memory overhead.
I figured that changing the priority of the make_clickable
filter so that it runs after force_balance_tags
significantly increased the efficiency of the RAM usage of our server when confronted with comments containing links. Since the <a>
tags are properly generated by WP itself, there is no reason to have it run through functions such as force_balance_tags
.
To reproduce the problem, just open a test setup of WordPress and navigate to a post that has comments enabled. Create a new one and open the developer console of the browser and type
comment.value = "ftp.x ".repeat(10916)
Submit the comment and notice how long it takes to load the site. Imagine hundreds of such comments being loaded by hundreds of bots at the same time, then you get what happened to our client's site.
The following shows the patch I applied (also attached as a diff file):
--- a.php 2019-06-09 22:47:58.744746903 +0200 +++ b.php 2019-06-09 22:49:00.080742874 +0200 @@ -190,7 +190,7 @@ add_filter( 'comment_text', 'wptexturize' ); add_filter( 'comment_text', 'convert_chars' ); -add_filter( 'comment_text', 'make_clickable', 9 ); +add_filter( 'comment_text', 'make_clickable', 26 ); add_filter( 'comment_text', 'force_balance_tags', 25 ); add_filter( 'comment_text', 'convert_smilies', 20 ); add_filter( 'comment_text', 'wpautop', 30 );
Best regards,
Olliver
Attachments (1)
Change History (5)
#3
@
5 years ago
Hi @SergeyBiryukov ,
thank you so much for your feedback! I did not think of this scenario, please let me apologize.
It seems that force_balance_tags
does a great job at producing mostly valid markup, so to address the issue with the test case you showed, it seems to be the easiest solution to make a trade off and have make_clickable run after the other filters (such as convert_smilies
but run before force_balance_tags
.
I added the following snippet to the functions.php
file of my active theme on my test-setup to do some performance measurements:
<?php add_action('init', function() { $comment_string = "x " . str_repeat("ftp.x ", 10916); // 1st test, run with default configuration $default_time = microtime(true); apply_filters('comment_text', $comment_string); $default_time = microtime(true) - $default_time; //2nd test, run with changed priorities (make_clickable before force_balance_tags) remove_filter('comment_text', 'make_clickable', 9); add_filter('comment_text', 'make_clickable', 24); $changed_time = microtime(true); apply_filters('comment_text', $comment_string); $changed_time = microtime(true) - $changed_time; // echo results echo "Default configuration runtime: " . $default_time . " secs" . "<br />"; echo "Changed configuration runtime: " . $changed_time . " secs" . "<br /> <br />"; exit; });
Running it for a 100 times showed this modification increases performance by ~23% on average, while still producing valid markup as addressed with your test case. What do you think?
The diff would then become:
--- a.php 2019-06-09 22:47:58.744746903 +0200 +++ b.php 2019-06-09 22:49:00.080742874 +0200 @@ -190,7 +190,7 @@ add_filter( 'comment_text', 'wptexturize' ); add_filter( 'comment_text', 'convert_chars' ); -add_filter( 'comment_text', 'make_clickable', 9 ); +add_filter( 'comment_text', 'make_clickable', 24 ); add_filter( 'comment_text', 'force_balance_tags', 25 ); add_filter( 'comment_text', 'convert_smilies', 20 ); add_filter( 'comment_text', 'wpautop', 30 );
#4
@
2 weeks ago
@SergeyBiryukov interestingly, both outputs you listed from force_balance_tags()
changes the markup to a semantically-equivalent document. with and without the patches, the generated DOM is identical. They are both wrong, but they are both wrong in the same way, and both produce invalid markup.
Normalized output without the patch.
<p><a href="#" rel="nofollow">test</a></p><a href="#" rel="nofollow"> </a><p><a href="#" rel="nofollow"> </a><a href="http://ftp.x" rel="nofollow">http://ftp.x</a></p>
Normalized output with the patch.
<p><a href="#" rel="nofollow">test</a></p><a href="#" rel="nofollow"> </a><p><a href="#" rel="nofollow"> </a><a href="http://ftp.x" rel="nofollow">http://ftp.x</a></p>
The correct normalization would be as follows.
<a href="#" rel="nofollow">test ftp.x</a>
We can see from the invalid behaviors that a link is generated with the wrong URL. It's "invalid" to open a new A
when one is already open, but the result is that it automatically closes any previously-open A
elements.
It seems like the real culprit is make_clickable()
, which isn't aware of when it's already inside a link. wpautop()
doesn't help (it's what creates the invalid HTML in your examples and what cuts the existing A
element in two), but maybe this would help if we simply prevented make_clickable()
from creating a link _from_ existing link content. It tries to do this, but in the same naive way it tries to parse HTML.
I was exploring code impacted by a refactor in force_balance_tags()
to use the HTML API, but in this case we need to address several of these functions. I believe that the performance issues will be addressed though, because the HTML API was built for cases like these.
Hi @olliverh87, welcome to WordPress Trac! Thank you for the detailed problem description.
What if there's an unclosed link in the comment, e.g. if the commenter forgot to close the
<a>
tag?Without the patch, the first link is properly closed before the second one is opened (even though there are extra
</p><p>
tags in between:With the patch, the links are nested, causing invalid markup: