WordPress.org

Make WordPress Core

Opened 5 months ago

Closed 2 months ago

Last modified 2 months ago

#47223 closed defect (bug) (fixed)

Site Health Check: "last missed cron" test too aggressive

Reported by: DavidAnderson Owned by: peterwilsoncc
Milestone: 5.3 Priority: normal
Severity: normal Version: 5.2
Component: Site Health Keywords: site-health has-patch has-unit-tests commit
Focuses: Cc:
PR Number:

Description

The WP_Site_Health::has_missed_cron() test will fail if there is a single cron event with a date more than 0 seconds in the past (the check done is if ( ( $cron->time - time() ) < 0 ) {). This is much too aggressive relative to the way that WP Cron works, relying upon incoming HTTP connections. A cron job running not at the precise time it was expected is normal *and documented* behaviour of WP Cron and not something to worry the user about (https://developer.wordpress.org/plugins/cron/ - "Though it may not run at a specific time, WP-Cron will get your tasks done in a timely manner").

A more sensible test to indicate that your cron queue is under-serviced would be something like something 5 minutes overdue; that would indicate that the number of visits the cron system is getting isn't keeping up with the number of jobs. (You'll still see false positives in various situations; just, a lot less of them).

Attachments (8)

47223.diff (2.2 KB) - added by rockfire 4 months ago.
47223.2.diff (2.9 KB) - added by peterwilsoncc 4 months ago.
47223.3.diff (2.9 KB) - added by rockfire 3 months ago.
47223.4.diff (3.4 KB) - added by afragen 2 months ago.
47223.5.diff (3.4 KB) - added by afragen 2 months ago.
fixed spacing, &$ICV&FILBJH:LK
47223.6.diff (4.1 KB) - added by afragen 2 months ago.
use variable in has_late_cron() instead of re-calculating
47223.7.diff (4.1 KB) - added by afragen 2 months ago.
fixed with feedback from @clorith
47223.8.diff (7.4 KB) - added by peterwilsoncc 2 months ago.

Download all attachments as: .zip

Change History (41)

#1 @johnbillion
5 months ago

  • Keywords needs-patch site-health added
  • Milestone changed from Awaiting Review to Future Release

Good spot.

#2 follow-up: @galbaras
5 months ago

I run wp-cron.php from an O/S cron job once every 15 minutes, so jobs can be as late as that. All my sites are now showing failures, although jobs are getting processed regularly and the sites work well.

There should be a check to see if cron is disabled or an alternative is used, because these are valid options.

This is still true in WP 5.2.1.

#3 in reply to: ↑ 2 ; follow-up: @knutsp
5 months ago

Replying to galbaras:

There should be a check to see if cron is disabled or an alternative is used, because these are valid options.

Sure, but point is to check that the jobs are executed in time, or with a reasonable delay, like 60 minutes or so (IMHO).

Why should the method (alternative way to kick it), or disabled, matter?

#4 in reply to: ↑ 3 @galbaras
5 months ago

Replying to knutsp:

Why should the method (alternative way to kick it), or disabled, matter?

I only run wp-cron.php once every 15 minutes. Anything shorter than that may be flagged as overdue, but really isn't, and the health checker has no way of knowing, because that's outside of WP.

Honestly, this can be just a matter of wording. Instead of saying "failed", just say "late", with some possible explanations depending on whether the way cron runs.

#5 @peterwilsoncc
5 months ago

How about this if DISABLE_WP_CRON === true:

  • Assume site is using crontab or similar to process events
  • wp-cron tasks are considered late if they haven't run for 15 minutes, 15 * MINUTE_IN SECONDS
  • Only consider wp-cron tasks failed after one hour, ie HOUR_IN_SECONDS

Otherwise:

  • The wp-cron loopback or redirect ought to have run when the user logged in to the dashboard
  • wp-cron tasks are considered late if they haven't run for five minutes
  • Any later and they are considered failed

This will require some additional conditions be added to the site health component.

#6 @galbaras
5 months ago

Another option is to provide a setting for "lateness" when DISABLE_WP_CRON === true, but I totally agree with @peterwilsoncc . When it comes to background tasks, there's no point to rush. After all, site health checks will only be run occasionally anyway.

#7 @johnbillion
5 months ago

Related: my WP Crontrol plugin performs an actual cron spawn request, minus the non-blocking flag, and surfaces its result if it's anything erroneous. Takes into account the cron constants too.

#8 @Clorith
5 months ago

Good thoughts, I'm curious what might be the best approach here, I'd not experienced a failed cron report my self, so I hadn't thought of it as being too strict, but I definitely see the potential as you've all mentioned.

The check is clearly going to need a few more conditions, which isn't a problem at all, we want this to be as accurate as possible, or else there's little use to it :)

Are there other scenarios the cron check needs to be aware of than the ones outlined already?

@rockfire
4 months ago

#9 @rockfire
4 months ago

  • Keywords has-patch needs-testing added; needs-patch removed

My patch is a 'first version'. It implements the timings mentioned by @peterwilsoncc

This ticket was mentioned in Slack in #core-php by richard_korthuis. View the logs.


4 months ago

#11 @desrosj
4 months ago

  • Component changed from Administration to Site Health

Moving Site Health tickets into their lovely new home, the Site Health component.

#12 @peterwilsoncc
4 months ago

@rockfire Thanks for uploading the patch.

I've uploaded 47223.2.diff which includes a few minor coding standards fixes and modifies the docblock slightly.

As the function is public and changes the return type, it is a break of backward compatibility which is something WordPress aims to maintain. I'm going to consider a way forward without breaking it.

The return value is still truthy but a plugin or theme author using a strict type check on the function's return value will get either false positives or negatives.

#13 @Clorith
4 months ago

Wouldn't a complimentary new function of has_late_cron() that needs to be truthy before we check for has_missed_cron() be an idea here? Yes it adds an extra layer, but it feels cleaner than to introducing a string here as well, we already may return a boolean or a WP_Error in that function call as is.

#14 @jetxpert
4 months ago

  • Keywords needs-patch added; has-patch removed

@rockfire ,

Thank you for your help. Does your PHP code fix the issue that many of us are experiencing when we disable WP_CRON and, instead, use a Real Cron Job?

Issue: We're getting "A scheduled event has failed" (i.e., cron) messages when using a Real Cron Job. Click here: http://prntscr.com/oam9nj

The cron errors (from affected plugin, etc) vary depending on when and how often we run WP's Site Health check. Not a biggie for us, but hoping WP will issue a patch to fix this soon.

Again, thank you.

@rockfire
3 months ago

#15 @rockfire
3 months ago

  • Keywords has-patch added; needs-patch removed

@peterwilsoncc I have fixed the backwards compatibility break, by introducing a new function has_late_cron as @Clorith suggested.

@jetxpert Yes this will fix it, when you disable WP_CRON a 'late' message will only appear if the cron is more than 15 minutes late (with WP_CRON enabled, it displays the 'late' message when the cron is even 1 second late). And a 'failed' message will be displayed if the cron is more than 1 hour late (with WP_CRON enabled that is after 5 minutes).

#16 @DavidAnderson
3 months ago

Why should the determination of whether a cron job is "late" depend upon the method being used to invoke cron jobs? This is over-complicating things. A job is either sufficiently late for it to be worth advising the user about it, or it isn't. If I turn up late for work, that's all the boss cares about - "but I decided to walk, slowly" doesn't mean that late is no longer late.

As a plugin developer who uses cron in several plugins, I see that lots of people disable the "loopback" cron mechanism and use something else, and set up that something else to call wp-cron.php far too infrequently relative to the number of cron jobs on their site. We shouldn't add allowances for that. If a job is sufficiently late to advise of at all, the user should be advised. And correspondingly, if it's decided that 15 minutes late is OK for some users, then it's OK for all. Or alternatively, if it really is intended that different cron methods are meant to work differently, this should be documented.

#17 @peterwilsoncc
3 months ago

@rockfire Thanks for the refreshed patch, it's greatly appreciated.

@DavidAnderson The presence of jobs in the wp-cron queue is being used as a proxy for determining whether cron is being run on the server.

For sites using the default set up a loopback request to /wp-cron.php will be fired as the dashboard is opened. If the loopback request is working, any due jobs will be processed.

For sites using crontab due jobs won't run until the next cron event is fired. The presence of due jobs does not indicate the site is not operating as the site administrator intends. The administrator may have switched to crontab because they have long running jobs that timeout over HTTP so must be run on the CLI.

You are right, a recommended frequency should be documented alongside the constant to disable wp-cron loopback requests. I will contact the docs team.

#18 @galbaras
3 months ago

@peterwilsoncc Why not make it a constant, given that WP CRON is disabled via wp-config.php?

Simple idea

define( 'WP_CRON_INTERVAL', 15 ); // Minutes
or
define( 'CLI_CRON_INTERVAL', 15 ); // Minutes

Radical idea

Repurpose DISABLE_WP_CRON to define( 'DISABLE_WP_CRON', 15 );.

The logic can be like this:

  • If DISABLE_WP_CRON is some positive integer (maybe within a reasonable range), the default WP_CRON will be disabled and Site Health will use it as the interval.
  • If DISABLE_WP_CRON === false, the normal WP_CRON will be used and Site Health will use its default interval for that.
  • Otherwise, WP_CRON will be disabled and Site Health will use a (different?) default value for the interval.

Doing this may actually help plugins and custom functions tailor various intervals to the specific sites. For example, admins will not be offered 1-minute and 5-minute intervals when they don't make sense for the site.

#19 @peterwilsoncc
3 months ago

@galbaras As a rule, WordPress is avoiding both defining and expanding the use of wp-config constants. I'll be expanding the documentation of the constant to recommend a frequency when I next get the chance.

To keep this ticket focused on the Site Health component, if you wish to discuss this further please open a ticket against the cron component with additional details on the use case.

#20 @galbaras
3 months ago

That's cool. I'll take a new constant, too :) Feel free to name something else, even, as long as the respective check makes sense in the context of sites like mine.

#21 follow-up: @peterwilsoncc
3 months ago

Component maintainers,

Are you happy with the approach in the most recent patch? It'd be dandy to get this in to 5.3 if possible.

My rough thoughts/questions are:

  • Are $timeout_late_cron and $timeout_missed_cron required as properties or can they be defined in the functions
  • Unit tests would be dandy, keep in mind DISABLE_WP_CRON === true in the test suite, affecting the timing rules
  • I'm not going to add/change constants, are filters required for the timing rules?

Thanks team!

cc @Clorith @miss_jwo @spacedmonkey @afragen

This ticket was mentioned in Slack in #core-site-health by dd32. View the logs.


2 months ago

@afragen
2 months ago

#23 @afragen
2 months ago

@peterwilsoncc yes I'm happy with the approach.

My rough thoughts/questions are:

  • Are $timeout_late_cron and $timeout_missed_cron required as properties or can they be defined in the functions

It seems that $timeout_late_cron could be defined in the function has_late_cron() but $timeout_missed_cron is used in has_late_cron() and has_missed_cron().

  • Unit tests would be dandy, keep in mind DISABLE_WP_CRON === true in the test suite, affecting the timing rules

Yes they would be. Alas, I've never actually written one  😔

  • I'm not going to add/change constants, are filters required for the timing rules?

I don't really think constants or filters are necessary for this module specific to the Site Health page. If the admin is sufficiently capable it's simple enough to just disable the test and prevent it from showing by using the add_filter( 'site_status_tests' ... ) and filter out scheduled_events from the array.

Also,

  • Must we test for true === DISABLE_WP_CRON? I thought the value was a boolean if defined. If so the patch should be updated in the constructor.
  • Isn't the WPCS elseif and not else if?
  • Couldn't the conditionals in get_test_scheduled_events() be changed from
if ( is_wp_error( $this->has_missed_cron() ) ) {
    ...
} else {
    if( $this->has_missed_cron() ) {
    ...
} else if ( $this->has_late_cron() {
    ...
}

to

if ( is_wp_error( $this->has_missed_cron() ) ) {
    ...
} elseif( $this->has_missed_cron() ) {
    ...
} elseif ( $this->has_late_cron() {
    ...
}

I've added a diff with my thoughts. Please ensure my logic with the restructured conditional.

Last edited 2 months ago by afragen (previous) (diff)

@afragen
2 months ago

fixed spacing, &$ICV&FILBJH:LK

#24 @afragen
2 months ago

Oh heck, I stopping now. I can't even tell if there's a difference between my last 2 patches and I'd really love to be able to delete one. I just don't know which one. Good night everyone.

Can someone just check my indentation.

Last edited 2 months ago by afragen (previous) (diff)

#25 in reply to: ↑ 21 ; follow-up: @SergeyBiryukov
2 months ago

  • Milestone changed from Future Release to 5.3

Replying to peterwilsoncc:

Are you happy with the approach in the most recent patch? It'd be dandy to get this in to 5.3 if possible.

Haven't fully reviewed the patch yet, just noted that comment:13 suggests checking for has_late_cron() first and then for has_missed_cron(), while the latest patch does it vice versa. Is that intentional?

#26 in reply to: ↑ 25 @rockfire
2 months ago

Replying to SergeyBiryukov:

Haven't fully reviewed the patch yet, just noted that comment:13 suggests checking for has_late_cron() first and then for has_missed_cron(), while the latest patch does it vice versa. Is that intentional?

Yes that is intentional. It first checks the worst-case-scenario (a missed cron) and if that is the case there is no need to check for the late cron. Only if there is no missed cron we check for a late cron.

#27 follow-up: @Clorith
2 months ago

I agree with comment:26 that it's better to look for worst cases, before checking for intermediary ones, so let's look for missed ones before seeing if any are late.

The approach in 47223.5.diff looks reasonable to me, I would make one change though, for the has_late_cron() check, store the ( $cron->time - time() ) segment to a single variable, it saves having to re-calculate it twice per if statement (and avoids time-shifts between the two in some cases), and also improves readability.

@afragen
2 months ago

use variable in has_late_cron() instead of re-calculating

#28 @afragen
2 months ago

I wasn't sure what to name the variable but this can be easily changed in the commit.

@afragen
2 months ago

fixed with feedback from @clorith

#29 in reply to: ↑ 27 @peterwilsoncc
2 months ago

  • Keywords needs-unit-tests added

Replying to Clorith:

I agree with comment:26 that it's better to look for worst cases, before checking for intermediary ones, so let's look for missed ones before seeing if any are late.

This makes sense.

These are the unit tests I'd like to see before this is committed:

  • Everything is dandy: Add a scheduled task to run in +3 minutes, ensure both the late and missed return false
  • Late cron jobs: Add a scheduled task to run in -16 minutes, ensure late returns true, missed returns false
  • Missed cron jobs: Add a scheduled task to run in -61 minutes, ensure both late and missed return true
  • No cron jobs: Ensure both late and missed return WP_Error

#30 @peterwilsoncc
2 months ago

  • Keywords has-unit-tests added; needs-unit-tests removed

47223.8.diff is 47223.7.diff but now with unit tests.

#31 @Clorith
2 months ago

  • Keywords commit added; needs-testing removed

47223.8.diff looks solid to me, the added unit tests are nice and thorough as well, thank you for those.

#32 @peterwilsoncc
2 months ago

  • Owner set to peterwilsoncc
  • Resolution set to fixed
  • Status changed from new to closed

In 45801:

Site Health Check: Increase time allowance for cron checks.

Introduces WP_Site_Health::has_late_cron() for late wp-cron jobs and extends the time allowance before a job is considered missed.

In a standard configuration using loopback requests, a job is considered late once past due and missed over five minutes past due.

Late and missed time frames are extended if DISABLE_WP_CRON is defined as true to allow for crontab tasks running less frequently. A job is considered late once it's 15 minutes past due and missed over one hour past due.

A file for site health unit tests has been introduced with tests for cron in critical, late and missed states.

Props rockfire, afragen, peterwilsoncc.
Fixes #47223.

#33 @SergeyBiryukov
2 months ago

In 45803:

Docs: Add missing @since tag for WP_Site_Health::has_late_cron().

See #47223.

Note: See TracTickets for help on using tickets.