Make WordPress Core

Opened 5 months ago

Closed 2 months ago

Last modified 6 weeks ago

#58202 closed defect (bug) (fixed)

php-fpm7.4: time limit exceeded `Operation canceled' @ fatal/cache.c/GetImagePixelCache/1868.

Reported by: dsar's profile dsar Owned by: antpb's profile antpb
Milestone: 6.3 Priority: normal
Severity: normal Version: 6.2
Component: Media Keywords:
Focuses: Cc:

Description

Since WordPress 6.2, I've been getting error - php-fpm7.4: time limit exceeded `Operation canceled' @ fatal/cache.c/GetImagePixelCache/1868. regularly in the php-fpm error log, which also results in 503 error in Apache log.

I believe this error is due to the new function set_imagick_time_limit() which uses Imagick::setResourceLimit() to set the time limit for Imagick processing to 80% of PHP's max_execution_time. Trac ticket: #52569

It'd seem Imagick::setResourceLimit() sets the limit for not only the duration of the single request, but for as long as the parent Apache process is active. And if so, I'm also guessing the limit is applied and counted against all the processing done during all requests the parent Apache process handles.

I tested this by creating 2 scripts:
imagickGet.php
<?php
$test = Imagick::getResourceLimit(Imagick::RESOURCETYPE_TIME);
var_dump($test);
var_dump(getmypid());

imagickSet.php
<?php
Imagick::setResourceLimit(Imagick::RESOURCETYPE_TIME, 17);
var_dump(getmypid());

Whenever I ran imagickSet.php once, any time I ran imagickGet.php I'd get 17 as the limit (provided the same process handled the request, as reported by getmypid()).
Tested on Apache 2.4.29, and PHP7.4.29.

Is anyone else experiencing this issue?

Change History (26)

This ticket was mentioned in Slack in #core-media by antpb. View the logs.


5 months ago

#2 @antpb
5 months ago

  • Milestone changed from Awaiting Review to 6.2.1
  • Owner set to antpb
  • Status changed from new to assigned

Adding this to 6.2.1 to see if we can scope this logic better.

#3 @azaozz
5 months ago

Starting to think that [55404] (#52569) should perhaps be reverted. It was a good idea to add it but there have been several bugs/edge cases since it was added and no evidence it actually works and/or improves image post-processing in-the-wild.

Seems bumping to the expected PHP timeout "the usual way" would cause less problems and is guaranteed to work (when allowed on the server).

Last edited 5 months ago by azaozz (previous) (diff)

This ticket was mentioned in Slack in #core by audrasjb. View the logs.


5 months ago

This ticket was mentioned in Slack in #core by audrasjb. View the logs.


5 months ago

#6 @audrasjb
5 months ago

  • Milestone changed from 6.2.1 to 6.2.2

Hello, welcome to WordPress Core Trac and thank you for opening this ticket,

As per today's bug scrub and given no other bug report was found on support forums or Trac, let's move this ticket to milestone 6.2.2 pending further bug report or reproduction steps.

#8 @hberberoglu
4 months ago

Hello, we also see that "php-fpm7.4: time limit exceeded `Operation canceled' @ fatal/cache.c/GetImagePixelCache/1868." error on our logs. The web server returns 502 to user.

#9 @cdimas
4 months ago

Hello, we also experience the same defect as @hberberoglu and @dsar after 6.2.
We use php-fpm 7.4 and nginx.

#10 @desrosj
4 months ago

  • Milestone changed from 6.2.2 to 6.2.3

Moving this to 6.2.3.

#11 @mtxz
4 months ago

Hello, I'm experiencing the same.

PHP 7.4.24

The error seems to happen "randomly" for 10-20% of requests.

PHP:

[22-May-2023 14:42:24] WARNING: [pool www] child 4501 said into stderr: "php-fpm: time limit exceeded `Operation canceled' @ fatal/cache.c/GetImagePixelCache/1868."
[22-May-2023 14:42:24] WARNING: [pool www] child 4501 exited with code 1 after 179.624455 seconds from start

Nginx (visitors get a 502 BAD GATEGAY nginx error page):

recv() failed (104: Connection reset by peer) while reading the response header from upstream

I tried many config tweaks on nginx, PHP & fpm (php-fpm.conf, www.conf - changing timeouts, buffer sizes, opcache settings...) from my research on the "recv" nginx error. Nothing seems to have any effect on this error. I see logs for both POTS and GET requests, logged-in or not, no matter the script (root index.php or wp-admin).
So it also appends for visitor requests, that, from my pov, are not triggering any image processing stuff...

Last edited 4 months ago by mtxz (previous) (diff)

#13 @densityapps
4 months ago

Just wanted to add a comment here to hopefully provide further clarity on the issue.

The reason that the update from [55404] caused this breakage is that the Imagick timeout setting begins counting from its first instantiation within the PHP process. As we know, many server configurations are setup to spawn a few PHP processes and reuse them over many requests, only spawning new processes when needed. So, by forcing the Imagick timeout value to align with the PHP timeout value, after a couple of requests, you're effectively setting the Imagick timeout to a value that has already elapsed, which then times out, causing that PHP process to exit. The server will replace that PHP process with a new one, starting the whole process over again.

I hope this information is helpful, and that the issues can be patched very soon. In the meantime the only way to correct this issue on an affected site is to comment out the Imagick::setResourceLimit call in class-wp-image-editor-imagick.php.

Side Note: The default Imagick timeout value is something like 18 quintilion seconds. If the goal is to avoid a PHP timeout while Imagick is processing, a better approach might be to instead override the PHP timeout whenever an Imagick process is started, to ensure that the Imagick process will always complete before a PHP timeout can occur. Though, I can see some scenarios where this might cause other issues. I don't know if the original goal of the change is something that really can/should be addressed from the WordPress side of things. It seems like it could only really be properly addressed from an Imagick update.

Last edited 3 months ago by SergeyBiryukov (previous) (diff)

#14 @ekazda
3 months ago

Same thing is occurring here. Flywheel hosting platform (Nginx). PHP version 8.1. WordPress version 6.2.2

WARNING: child 796 said into stderr: "php-fpm8.1: time limit exceeded `Operation canceled' @ fatal/cache.c/GetImagePixelCache/1868."
WARNING: child 796 exited with code 1 after 10358.241346 seconds from start

Flywheel support is now recommending to its users that they rollback to 6.1.3 until this issue is properly resolved. I think this might elevate the severity of the problem.

Last edited 3 months ago by ekazda (previous) (diff)

#15 @matt_fw
3 months ago

We were affected by this issue. We solved it by creating a separate PHP-FPM pool for async-upload.php, directing traffic to it, and setting its pm.max_requests = 1 and pm.max_servers = 4. The important part is pm.max_requests = 1 to reset ImageMagick's cache timer on each request.

However, after reading through the ImageMagick resource handling code ( https://github.com/ImageMagick/ImageMagick/blob/a531d28e31309676ce8168c3b6dbbb5374b78790/MagickCore/resource.c#L1496 ) I think it's enough to call Imagick::setResourceLimit( Imagick::RESOURCETYPE_TIME, $limit ); on each request, even if the already set value is the same. In WP 6.2.2 the call is executed only when max_execution_time < imagick_timeout, but that happens only on the first call to Imagick::set_imagick_time_limit() within each PHP-FPM process.

Calling Imagick::setResourceLimit( Imagick::RESOURCETYPE_TIME, $limit ); resets ImageMagick cache_epoch = 0 (see https://github.com/ImageMagick/ImageMagick/blob/a531d28e31309676ce8168c3b6dbbb5374b78790/MagickCore/cache.c#L4847) and effectively makes ImageMagick's cache act as if it was request-based.

#16 follow-up: @soulseekah
3 months ago

@matt_fw is right.

Consider this simple example that can be reproduced using the built-in PHP server. This would act as an FPM child process, handling requests without terminating.

We need two files:

crop.php, this is a simulation of running crop via the WordPress Image Editor API, which calls setResourceLimit and resets the timer.

<?php
Imagick::setResourceLimit(IMagick::RESOURCETYPE_TIME, 3);

$imagick = new \Imagick('test.jpg');
$imagick->resizeImage(500, 400, imagick::FILTER_UNDEFINED, 1, false);

blur.php, this is a simulation of a WordPress plugin that calls IMagick APIs directly without resetting the resource limit (does not call setResourceLimit).

<?php
$imagick = new \Imagick('test.jpg');
$imagick->blurImage(5, 3);

Hit crop first. Wait for four seconds. Then hit blur. You'll trigger the error, because blur didn't reset the timer. Hit crop to reset the timer, hit blur immediately after several times until you see the message again.

This means that using crop/resize in the Image Editor API even once will continue to trigger errors for all other operations that don't reset the limit. The default limit is 9.2233720368548E+18 seconds. Calling crop/resize will put to 80% of max_execution_time and keep it there for all subsequent requests until the process is restarted.

The patch has another edge case, wherein cropping/resizing multiple images over a lengthy period of time via the Editor API will actually keep on resetting the timer, which completely defeats the purpose of aborting early.

One solution would be to call set_imagick_time_limit() on all WordPress requests but only once per request, and regardless of the context as to constantly reset the internal ImageMagick timer.

Another solution would be to revert the patch as @azaozz suggests, as the implementation is flawed (resets per operation) and breaks things for all subsequent requests.

#17 @densityapps
3 months ago

I strongly agree. This change should be reverted. As I mentioned previously, I don't think this is something that should be attempted to be "fixed" via WordPress. This is overriding server configuration, which should be left to the server admins, to solve a problem that isn't created by WP. Per my previous reply, and as noted by @matt_fw and @soulseekah, the timeout value remains in place for the life of the PHP process. It certainly wouldn't be appropriate to force every host to relegate themselves to only single use PHP processes in order for WP to function correctly. That's basically a forced downgrade of the usefulness of PHP-FPM. Any correction to address issues caused by PHP timeouts while Imagick is still processing, should come from an Imagick update.

#18 in reply to: ↑ 16 ; follow-ups: @azaozz
3 months ago

  • Keywords 2nd-opinion needs-patch added

Replying to soulseekah:

One solution would be to call set_imagick_time_limit() on all WordPress requests but only once per request, and regardless of the context as to constantly reset the internal ImageMagick timer.

This sounds reasonable but as @densityapps explains, it may not fix this everywhere. Also seems the default processing time limit is "infinity", see: https://github.com/ImageMagick/ImageMagick/blob/a8de149e1aca79836319b31c8881537115e478ff/MagickCore/resource.c#L1283C70-L1283C70.
Not sure WP should be changing that :)

Another solution would be to revert the patch

Yes, agree with @densityapps that seems best. Adding the "second opinion" keyword in case somebody else wants to chime in. If not will look at reverting [55404] in few days.

#19 in reply to: ↑ 18 @SergeyBiryukov
3 months ago

Replying to azaozz:

Another solution would be to revert the patch

Yes, agree with @densityapps that seems best. Adding the "second opinion" keyword in case somebody else wants to chime in. If not will look at reverting [55404] in few days.

Yes, seems reasonable to revert if it causes more issues than it solves :)

#20 in reply to: ↑ 18 @mikeschroder
3 months ago

Yes, agree with @densityapps that seems best. Adding the "second opinion" keyword in case somebody else wants to chime in. If not will look at reverting [55404] in few days.

Given the new information (since the original commit / fix), I agree this seems like the best way forward.

#21 follow-up: @mikeschroder
2 months ago

  • Keywords 2nd-opinion removed

Just to check in, since we're getting close to 6.3 RC.

@azaozz are you still thinking to revert this / available to revert this before RC1?

If not, I can put something together.

It seems like because it's a public method, the route would be to remove setting the time limit by default, but leave the ability to do so?

#22 in reply to: ↑ 21 @azaozz
2 months ago

Replying to mikeschroder:

Yes, will revert in few min.

It seems like because it's a public method, the route would be to remove setting the time limit by default, but leave the ability to do so?

Yea, sounds reasonable. Although thinking it may also be good to deprecate set_imagick_time_limit() and even maybe "disable" it (just trigger deprecation notice and return). The the default processing time limit in ImageMagick is "infinity", ideally PHP's timeout should be something similar when running imagick, not the other-way round? :)
Seeing the bugs above don't think WP should be changing that or even have a method to change it.

I'll just commit the deprecation for now, but lets remove the functionality if all agree.

Last edited 2 months ago by azaozz (previous) (diff)

#23 @azaozz
2 months ago

  • Resolution set to fixed
  • Status changed from assigned to closed

In 56250:

Media: Stop using and deprecate set_imagick_time_limit(). Seems it causes more problems than it solves.

Follow-up to #52569 and [55404].

Props: dsar, antpb, audrasjb, SergeyBiryukov, hberberoglu, mtxz, densityapps, ekazda, matt_fw, soulseekah, mikeschroder, azaozz.

Fixes: #58202.

#24 @SergeyBiryukov
7 weeks ago

  • Keywords needs-patch removed
  • Milestone changed from 6.2.3 to 6.3

Looks like this was fixed in trunk but not backported to the 6.2 branch.

Moving to 6.3, as there are no plans for 6.2.3 at this time.

#25 follow-up: @bakirov
6 weeks ago

  • Version changed from 6.2 to 6.2.2

Dear Devs!

We updated to 6.3, but still have this issue. Is there any workaround we can use till the fixes from upstream is merged back to the stable version?

Thank you and warm regards!

#26 in reply to: ↑ 25 @SergeyBiryukov
6 weeks ago

  • Version changed from 6.2.2 to 6.2

Replying to bakirov:

We updated to 6.3, but still have this issue. Is there any workaround we can use till the fixes from upstream is merged back to the stable version?

The fix in [56250] is included in 6.3, so if that doesn't resolve the issue, I would suggest creating a new ticket for further investigation, as this one is closed on a completed milestone.

The Version field indicates the earliest applicable version. Since the ::set_imagick_time_limit() method was introduced in [55404] / #52569, 6.2 is correct here.

Note: See TracTickets for help on using tickets.