Make WordPress Core

Opened 4 years ago

Last modified 4 days ago

#51019 new defect (bug)

convert_smilies() fails on large tags

Reported by: podpirate's profile podpirate Owned by:
Milestone: 6.7 Priority: normal
Severity: normal Version: 5.5
Component: Formatting Keywords: has-patch php80 has-unit-tests has-testing-info needs-testing
Focuses: Cc:

Description

I have an image with huge data-url src (1.6M) in the post content.
The post content is not displayed, but (with WP_DEBUG on) there is a php error instead.

PHP Error output:

 Warning: count(): Parameter must be an array or an object that implements Countable in /Users/joern/www/vhosts/wordpress.local/httpdocs/stable/wp-includes/formatting.php on line 3357

Som digging revealed, that preg_split() in function convert_smilies() fails with a PREG_RECURSION_LIMIT_ERROR.

IMHO there should a check for preg_last_error() whether preg_split() was successfull, and if not the function should just return its input.

Any thoughts on this?
I'd be happy to craft a patch.

Change History (26)

#1 @SergeyBiryukov
4 years ago

  • Component changed from General to Formatting
  • Keywords needs-patch added

Hi there, welcome back to WordPress Trac! Thanks for the ticket.

IMHO there should a check for preg_last_error() whether preg_split() was successfull, and if not the function should just return its input.

Makes sense to me :)

This ticket was mentioned in PR #792 on WordPress/wordpress-develop by nathkrill.


4 years ago
#2

  • Keywords has-patch added; needs-patch removed

…o avoid issues in ticket #51019

Use preg_last_error() inside convert_smilies() to check for any preg_split errors before running count().
If there are errors, sends a PHP warning with the error message, then returns the original text.

Running count() on the array after there are preg_split errors can cause post content to not display (see ticket).

Trac ticket: https://core.trac.wordpress.org/ticket/51019

github-actions[bot] commented on PR #792:


4 years ago
#3

Hi @nathkrill! 👋

Thank you for your contribution to WordPress! 💖

It looks like this is your first pull request to wordpress-develop. Here are a few things to be aware of that may help you out!

No one monitors this repository for new pull requests. Pull requests must be attached to a Trac ticket to be considered for inclusion in WordPress Core. To attach a pull request to a Trac ticket, please include the ticket's full URL in your pull request description.

Pull requests are never merged on GitHub. The WordPress codebase continues to be managed through the SVN repository that this GitHub repository mirrors. Please feel free to open pull requests to work on any contribution you are making.

More information about how GitHub pull requests can be used to contribute to WordPress can be found in this blog post.

Please include automated tests. Including tests in your pull request is one way to help your patch be considered faster. To learn about WordPress' test suites, visit the Automated Testing page in the handbook.

If you have not had a chance, please review the Contribute with Code page in the WordPress Core Handbook.

The Developer Hub also documents the various coding standards that are followed:

Thank you,
The WordPress Project

This ticket was mentioned in PR #1124 on WordPress/wordpress-develop by nathkrill.


3 years ago
#4

Use preg_last_error() inside convert_smilies() to check for any preg_split errors before running count().
If there are errors, sends a PHP warning with the error message, then returns the original text.

Running count() on the array after there are preg_split errors can cause post content to not display (see ticket).

This is a reformat of a previous pull request to get the PHP tests to pass.

Trac ticket: https://core.trac.wordpress.org/ticket/51019

borzaszto commented on PR #1124:


2 years ago
#5

Same, +1 Up 🙌

lzudor-omnitech commented on PR #1124:


2 years ago
#6

Is there any ETA on this..?

#7 @rupw
20 months ago

FWIW I'd probably just check $textarr === false rather than preg_last_error(), but looks good to me.

Another report on WPSE: https://wordpress.stackexchange.com/q/409084

#8 @jorbin
6 months ago

#59927 was marked as a duplicate.

#9 @jorbin
6 months ago

  • Keywords needs-unit-tests php80 added
  • Milestone changed from Awaiting Review to 6.5

I would like for there to be an automated test for this, but otherwise, it is something that should be fixed for 6.5.

#10 @swissspidy
4 months ago

  • Keywords needs-refresh added

This ticket was mentioned in Slack in #core by audrasjb. View the logs.


4 months ago

#12 @audrasjb
4 months ago

  • Keywords changes-requested added; needs-refresh removed

As per today's bugscrub:
Some changes were requested in the current PR, and it still needs unit tests. Let's keep it in the milestone for a week.

@audrasjb commented on PR #1124:


4 months ago
#13

@nathkrill are you able to refresh your PR with the requested changes? Thank you!

This ticket was mentioned in Slack in #core by rajinsharwar. View the logs.


4 months ago

#15 @rajinsharwar
4 months ago

Let's keep this in the milestone for some more days, and if there is still not much activity, we might need to bump this again.

#16 @swissspidy
4 months ago

  • Milestone changed from 6.5 to 6.6

This ticket was mentioned in Slack in #core by nhrrob. View the logs.


3 weeks ago

#18 @nhrrob
3 weeks ago

We have reviewed in today's bug scrub.
It has changes requested. And unit test is required.

Let's keep it in the milestone one more week.

This ticket was mentioned in PR #6786 on WordPress/wordpress-develop by @rajinsharwar.


3 weeks ago
#19

  • Keywords has-unit-tests added; needs-unit-tests removed

Bailing out in convert_smiles() is preg_split() is false.

Trac ticket: https://core.trac.wordpress.org/ticket/51019

#20 @rajinsharwar
3 weeks ago

  • Keywords has-testing-info needs-testing added; changes-requested removed

I added a new PR along with the unit tests. So, I am just bailing out and returning the input text directly, if the preg_replace fails. Added a unit test as well.

I removed the part to show the error message, because the preg_last_error_msg() was introduced in PHP 8, and users with PHP 7 or before will encounter a Fatal error.

Also, we can also use some switch statements to use the preg_last_error() error code, to display error messages based on those. But, I think that would introduce so many strings, and much more complexity than required, for this edge case.

Feel free to share your thoughts if you think we should output the error message as well. 🙂

Testing instructions:

  1. Enable WP_DEBUG
  2. Add the below code in functions.php
function test(){
	$text =  '<p><img alt="" src="data:image/png;base64,' . str_repeat( "iVBORw0KGgoAAAAN", 99999 ) . '="></p>' ;
	convert_smilies( $text );
}

add_action( 'init', 'test' );

You will notice some Fatal errors.

  1. Apply the patch, and repeat the process. You shouldn't notice any Fatal error.

This ticket was mentioned in Slack in #core-test by rajinsharwar. View the logs.


5 days ago

This ticket was mentioned in Slack in #core by nhrrob. View the logs.


5 days ago

#23 @audrasjb
5 days ago

  • Milestone changed from 6.6 to 6.7

As per today's bug scrub, this ticket still needs testing. Moving to 6.7.

#24 @dmsnell
4 days ago

This same bug was recently discussed within Automattic and @bjorsch identified a problem with the regex being written in a way that invites unrestrained backtracking.

Converting '/(<.*>)/U' into /(<[^>]*>/U would likely resolve this particular problem, as it would get rid of all the backtracking when attempting to poorly segment the HTML.

A more comprehensive fix that I've proposed making is to replace this code altogether with the HTML API, which will eliminate a number of bugs, including this one.

This ticket was mentioned in PR #6905 on WordPress/wordpress-develop by @dmsnell.


4 days ago
#25

Trac ticket: Core-51019.

Among other issues, convert_smilies() will fail when parsing documents with very long attributes, such as those containing images as data URIs. This occurs frequently when importing content from cloud document editors.

In this patch, the function is refactored to rely on the HTML API instead of its custom PCRE pattern to split and tokenize an HTML document. This avoids the backtracking problem that came with the PCRE function and it improves the reliability of how the function matches HTML text nodes.

Fixes #51019.

#26 @dmsnell
4 days ago

The patch I just posted should solve this through use of the HTML API instead of regexes to parse the HTML. It should have the added benefit of being able to reliable parse any given HTML. The only remaining quirk is that the way we track CODE and PRE elements is prone to error if they aren't properly balanced, but this is probably fine, and the consequences are merely that smilies won't get converted in such a failure case - it won't crash the site.

Might be an interesting test for the 6.7 cycle since we now have a long time to test it.

One change present in this updated version is that text nodes containing smilies _and other HTML character references_ will be modified such that other character references are decoded as well. This is because the HTML API only exposes decoded strings so that Core doesn't have to try and do all string replacement logic multiple times (as evidenced in this existing code where it checks for the UTF-8 encoded non-breaking space and for &nbsp; - this approach is guaranteed to corrupt data)

Note: See TracTickets for help on using tickets.