Opened 11 years ago
Last modified 5 years ago
#25851 new defect (bug)
post_content lost when inserting Posts with large base64-encoded images
Reported by: | ctayloroomphinc | Owned by: | |
---|---|---|---|
Milestone: | Priority: | low | |
Severity: | critical | Version: | 3.8 |
Component: | Formatting | Keywords: | kses needs-patch |
Focuses: | Cc: |
Description
post_content will be silently blanked-out during a wp_insert_post() call that contains large base64-encoded images within Post HTML content.
I've found the source of this problem to be the wp_pre_kses_less_than() function. In the scenario described above, the call to preg_replace_callback() fails, causing a NULL to be returned from wp_pre_kses_less_than(), instead of a string. In this case preg_replace_callback() returns NULL because of a PREG_BACKTRACK_LIMIT_ERROR.
The error can be worked around by setting a higher pcre.backtrack_limit in php.ini, but I still consider this to be a bug because the wp_pre_kses_less_than() should at least emit a warning to the debug.log informing someone that there was an internal PHP error (silent in this case, regardless of error_reporting settings).
Beyond this, the function should check the return value of preg_replace_callback(), and if it is NULL, perhaps consider returning the original string as it was passed by the caller, so the caller does not end up with unexpectedly empty post_content.
Another option would be to look in to tweaking the regex so that it does need exceed the default resources as configured by pcre.backtrack_limit, which is 100000 by default.
More info: http://www.php.net/manual/en/pcre.configuration.php#ini.pcre.backtrack-limit
I have attached a file that shows how to re-produce this bug. Run the file via WP-CLI like so:
wp eval-file wp_pre_kses_less_than_bug_repro.php
I'm setting the Severity of this bug to critical, since it results in unexpected data loss.
Thanks for the report, Corey. This is a tough one. It would be great to optimize kses, but given its nature (security), it isn't easy to do. Many have tried, only a few have come out alive.
One workaround would be to run wp_insert_post() as a user privileged enough to not have kses run. Or simply kses_remove_filters() followed by a kses_init(). If the data is trusted, of course.
I will say, that's a *ton* of data (and it's looking through to find exploitative protocols, among other things), but it's possible that for certain kinds of data, we can simply bypass it due to a whitelist. For example, at some point, all we have to deal with are 64 characters, none of them dangerous. It could be one approach to take. I could also totally go for increasing the backtrack limit if we can't come up with a better way to do it (we've done it elsewhere) — out of curiosity, how much did you need to increase it to get it to work?