WordPress.org

Make WordPress Core

Opened 15 months ago

Closed 12 months ago

Last modified 3 months ago

#47223 closed defect (bug) (fixed)

Site Health Check: "last missed cron" test too aggressive

Reported by: DavidAnderson Owned by: peterwilsoncc
Milestone: 5.3 Priority: normal
Severity: normal Version: 5.2
Component: Site Health Keywords: site-health has-patch has-unit-tests commit
Focuses: Cc:

Description

The WP_Site_Health::has_missed_cron() test will fail if there is a single cron event with a date more than 0 seconds in the past (the check done is if ( ( $cron->time - time() ) < 0 ) {). This is much too aggressive relative to the way that WP Cron works, relying upon incoming HTTP connections. A cron job running not at the precise time it was expected is normal *and documented* behaviour of WP Cron and not something to worry the user about (https://developer.wordpress.org/plugins/cron/ - "Though it may not run at a specific time, WP-Cron will get your tasks done in a timely manner").

A more sensible test to indicate that your cron queue is under-serviced would be something like something 5 minutes overdue; that would indicate that the number of visits the cron system is getting isn't keeping up with the number of jobs. (You'll still see false positives in various situations; just, a lot less of them).

Attachments (8)

47223.diff (2.2 KB) - added by rockfire 14 months ago.
47223.2.diff (2.9 KB) - added by peterwilsoncc 14 months ago.
47223.3.diff (2.9 KB) - added by rockfire 13 months ago.
47223.4.diff (3.4 KB) - added by afragen 12 months ago.
47223.5.diff (3.4 KB) - added by afragen 12 months ago.
fixed spacing, &$ICV&FILBJH:LK
47223.6.diff (4.1 KB) - added by afragen 12 months ago.
use variable in has_late_cron() instead of re-calculating
47223.7.diff (4.1 KB) - added by afragen 12 months ago.
fixed with feedback from @clorith
47223.8.diff (7.4 KB) - added by peterwilsoncc 12 months ago.

Download all attachments as: .zip

Change History (49)

#1 @johnbillion
15 months ago

  • Keywords needs-patch site-health added
  • Milestone changed from Awaiting Review to Future Release

Good spot.

#2 follow-up: @galbaras
15 months ago

I run wp-cron.php from an O/S cron job once every 15 minutes, so jobs can be as late as that. All my sites are now showing failures, although jobs are getting processed regularly and the sites work well.

There should be a check to see if cron is disabled or an alternative is used, because these are valid options.

This is still true in WP 5.2.1.

#3 in reply to: ↑ 2 ; follow-up: @knutsp
15 months ago

Replying to galbaras:

There should be a check to see if cron is disabled or an alternative is used, because these are valid options.

Sure, but point is to check that the jobs are executed in time, or with a reasonable delay, like 60 minutes or so (IMHO).

Why should the method (alternative way to kick it), or disabled, matter?

#4 in reply to: ↑ 3 @galbaras
15 months ago

Replying to knutsp:

Why should the method (alternative way to kick it), or disabled, matter?

I only run wp-cron.php once every 15 minutes. Anything shorter than that may be flagged as overdue, but really isn't, and the health checker has no way of knowing, because that's outside of WP.

Honestly, this can be just a matter of wording. Instead of saying "failed", just say "late", with some possible explanations depending on whether the way cron runs.

#5 @peterwilsoncc
14 months ago

How about this if DISABLE_WP_CRON === true:

  • Assume site is using crontab or similar to process events
  • wp-cron tasks are considered late if they haven't run for 15 minutes, 15 * MINUTE_IN SECONDS
  • Only consider wp-cron tasks failed after one hour, ie HOUR_IN_SECONDS

Otherwise:

  • The wp-cron loopback or redirect ought to have run when the user logged in to the dashboard
  • wp-cron tasks are considered late if they haven't run for five minutes
  • Any later and they are considered failed

This will require some additional conditions be added to the site health component.

#6 @galbaras
14 months ago

Another option is to provide a setting for "lateness" when DISABLE_WP_CRON === true, but I totally agree with @peterwilsoncc . When it comes to background tasks, there's no point to rush. After all, site health checks will only be run occasionally anyway.

#7 @johnbillion
14 months ago

Related: my WP Crontrol plugin performs an actual cron spawn request, minus the non-blocking flag, and surfaces its result if it's anything erroneous. Takes into account the cron constants too.

#8 @Clorith
14 months ago

Good thoughts, I'm curious what might be the best approach here, I'd not experienced a failed cron report my self, so I hadn't thought of it as being too strict, but I definitely see the potential as you've all mentioned.

The check is clearly going to need a few more conditions, which isn't a problem at all, we want this to be as accurate as possible, or else there's little use to it :)

Are there other scenarios the cron check needs to be aware of than the ones outlined already?

@rockfire
14 months ago

#9 @rockfire
14 months ago

  • Keywords has-patch needs-testing added; needs-patch removed

My patch is a 'first version'. It implements the timings mentioned by @peterwilsoncc

This ticket was mentioned in Slack in #core-php by richard_korthuis. View the logs.


14 months ago

#11 @desrosj
14 months ago

  • Component changed from Administration to Site Health

Moving Site Health tickets into their lovely new home, the Site Health component.

#12 @peterwilsoncc
14 months ago

@rockfire Thanks for uploading the patch.

I've uploaded 47223.2.diff which includes a few minor coding standards fixes and modifies the docblock slightly.

As the function is public and changes the return type, it is a break of backward compatibility which is something WordPress aims to maintain. I'm going to consider a way forward without breaking it.

The return value is still truthy but a plugin or theme author using a strict type check on the function's return value will get either false positives or negatives.

#13 @Clorith
14 months ago

Wouldn't a complimentary new function of has_late_cron() that needs to be truthy before we check for has_missed_cron() be an idea here? Yes it adds an extra layer, but it feels cleaner than to introducing a string here as well, we already may return a boolean or a WP_Error in that function call as is.

#14 @jetxpert
13 months ago

  • Keywords needs-patch added; has-patch removed

@rockfire ,

Thank you for your help. Does your PHP code fix the issue that many of us are experiencing when we disable WP_CRON and, instead, use a Real Cron Job?

Issue: We're getting "A scheduled event has failed" (i.e., cron) messages when using a Real Cron Job. Click here: http://prntscr.com/oam9nj

The cron errors (from affected plugin, etc) vary depending on when and how often we run WP's Site Health check. Not a biggie for us, but hoping WP will issue a patch to fix this soon.

Again, thank you.

@rockfire
13 months ago

#15 @rockfire
13 months ago

  • Keywords has-patch added; needs-patch removed

@peterwilsoncc I have fixed the backwards compatibility break, by introducing a new function has_late_cron as @Clorith suggested.

@jetxpert Yes this will fix it, when you disable WP_CRON a 'late' message will only appear if the cron is more than 15 minutes late (with WP_CRON enabled, it displays the 'late' message when the cron is even 1 second late). And a 'failed' message will be displayed if the cron is more than 1 hour late (with WP_CRON enabled that is after 5 minutes).

#16 @DavidAnderson
13 months ago

Why should the determination of whether a cron job is "late" depend upon the method being used to invoke cron jobs? This is over-complicating things. A job is either sufficiently late for it to be worth advising the user about it, or it isn't. If I turn up late for work, that's all the boss cares about - "but I decided to walk, slowly" doesn't mean that late is no longer late.

As a plugin developer who uses cron in several plugins, I see that lots of people disable the "loopback" cron mechanism and use something else, and set up that something else to call wp-cron.php far too infrequently relative to the number of cron jobs on their site. We shouldn't add allowances for that. If a job is sufficiently late to advise of at all, the user should be advised. And correspondingly, if it's decided that 15 minutes late is OK for some users, then it's OK for all. Or alternatively, if it really is intended that different cron methods are meant to work differently, this should be documented.

#17 @peterwilsoncc
13 months ago

@rockfire Thanks for the refreshed patch, it's greatly appreciated.

@DavidAnderson The presence of jobs in the wp-cron queue is being used as a proxy for determining whether cron is being run on the server.

For sites using the default set up a loopback request to /wp-cron.php will be fired as the dashboard is opened. If the loopback request is working, any due jobs will be processed.

For sites using crontab due jobs won't run until the next cron event is fired. The presence of due jobs does not indicate the site is not operating as the site administrator intends. The administrator may have switched to crontab because they have long running jobs that timeout over HTTP so must be run on the CLI.

You are right, a recommended frequency should be documented alongside the constant to disable wp-cron loopback requests. I will contact the docs team.

#18 @galbaras
13 months ago

@peterwilsoncc Why not make it a constant, given that WP CRON is disabled via wp-config.php?

Simple idea

define( 'WP_CRON_INTERVAL', 15 ); // Minutes
or
define( 'CLI_CRON_INTERVAL', 15 ); // Minutes

Radical idea

Repurpose DISABLE_WP_CRON to define( 'DISABLE_WP_CRON', 15 );.

The logic can be like this:

  • If DISABLE_WP_CRON is some positive integer (maybe within a reasonable range), the default WP_CRON will be disabled and Site Health will use it as the interval.
  • If DISABLE_WP_CRON === false, the normal WP_CRON will be used and Site Health will use its default interval for that.
  • Otherwise, WP_CRON will be disabled and Site Health will use a (different?) default value for the interval.

Doing this may actually help plugins and custom functions tailor various intervals to the specific sites. For example, admins will not be offered 1-minute and 5-minute intervals when they don't make sense for the site.

#19 @peterwilsoncc
13 months ago

@galbaras As a rule, WordPress is avoiding both defining and expanding the use of wp-config constants. I'll be expanding the documentation of the constant to recommend a frequency when I next get the chance.

To keep this ticket focused on the Site Health component, if you wish to discuss this further please open a ticket against the cron component with additional details on the use case.

#20 @galbaras
13 months ago

That's cool. I'll take a new constant, too :) Feel free to name something else, even, as long as the respective check makes sense in the context of sites like mine.

#21 follow-up: @peterwilsoncc
12 months ago

Component maintainers,

Are you happy with the approach in the most recent patch? It'd be dandy to get this in to 5.3 if possible.

My rough thoughts/questions are:

  • Are $timeout_late_cron and $timeout_missed_cron required as properties or can they be defined in the functions
  • Unit tests would be dandy, keep in mind DISABLE_WP_CRON === true in the test suite, affecting the timing rules
  • I'm not going to add/change constants, are filters required for the timing rules?

Thanks team!

cc @Clorith @miss_jwo @spacedmonkey @afragen

This ticket was mentioned in Slack in #core-site-health by dd32. View the logs.


12 months ago

@afragen
12 months ago

#23 @afragen
12 months ago

@peterwilsoncc yes I'm happy with the approach.

My rough thoughts/questions are:

  • Are $timeout_late_cron and $timeout_missed_cron required as properties or can they be defined in the functions

It seems that $timeout_late_cron could be defined in the function has_late_cron() but $timeout_missed_cron is used in has_late_cron() and has_missed_cron().

  • Unit tests would be dandy, keep in mind DISABLE_WP_CRON === true in the test suite, affecting the timing rules

Yes they would be. Alas, I've never actually written one  😔

  • I'm not going to add/change constants, are filters required for the timing rules?

I don't really think constants or filters are necessary for this module specific to the Site Health page. If the admin is sufficiently capable it's simple enough to just disable the test and prevent it from showing by using the add_filter( 'site_status_tests' ... ) and filter out scheduled_events from the array.

Also,

  • Must we test for true === DISABLE_WP_CRON? I thought the value was a boolean if defined. If so the patch should be updated in the constructor.
  • Isn't the WPCS elseif and not else if?
  • Couldn't the conditionals in get_test_scheduled_events() be changed from
if ( is_wp_error( $this->has_missed_cron() ) ) {
    ...
} else {
    if( $this->has_missed_cron() ) {
    ...
} else if ( $this->has_late_cron() {
    ...
}

to

if ( is_wp_error( $this->has_missed_cron() ) ) {
    ...
} elseif( $this->has_missed_cron() ) {
    ...
} elseif ( $this->has_late_cron() {
    ...
}

I've added a diff with my thoughts. Please ensure my logic with the restructured conditional.

Last edited 12 months ago by afragen (previous) (diff)

@afragen
12 months ago

fixed spacing, &$ICV&FILBJH:LK

#24 @afragen
12 months ago

Oh heck, I stopping now. I can't even tell if there's a difference between my last 2 patches and I'd really love to be able to delete one. I just don't know which one. Good night everyone.

Can someone just check my indentation.

Last edited 12 months ago by afragen (previous) (diff)

#25 in reply to: ↑ 21 ; follow-up: @SergeyBiryukov
12 months ago

  • Milestone changed from Future Release to 5.3

Replying to peterwilsoncc:

Are you happy with the approach in the most recent patch? It'd be dandy to get this in to 5.3 if possible.

Haven't fully reviewed the patch yet, just noted that comment:13 suggests checking for has_late_cron() first and then for has_missed_cron(), while the latest patch does it vice versa. Is that intentional?

#26 in reply to: ↑ 25 @rockfire
12 months ago

Replying to SergeyBiryukov:

Haven't fully reviewed the patch yet, just noted that comment:13 suggests checking for has_late_cron() first and then for has_missed_cron(), while the latest patch does it vice versa. Is that intentional?

Yes that is intentional. It first checks the worst-case-scenario (a missed cron) and if that is the case there is no need to check for the late cron. Only if there is no missed cron we check for a late cron.

#27 follow-up: @Clorith
12 months ago

I agree with comment:26 that it's better to look for worst cases, before checking for intermediary ones, so let's look for missed ones before seeing if any are late.

The approach in 47223.5.diff looks reasonable to me, I would make one change though, for the has_late_cron() check, store the ( $cron->time - time() ) segment to a single variable, it saves having to re-calculate it twice per if statement (and avoids time-shifts between the two in some cases), and also improves readability.

@afragen
12 months ago

use variable in has_late_cron() instead of re-calculating

#28 @afragen
12 months ago

I wasn't sure what to name the variable but this can be easily changed in the commit.

@afragen
12 months ago

fixed with feedback from @clorith

#29 in reply to: ↑ 27 @peterwilsoncc
12 months ago

  • Keywords needs-unit-tests added

Replying to Clorith:

I agree with comment:26 that it's better to look for worst cases, before checking for intermediary ones, so let's look for missed ones before seeing if any are late.

This makes sense.

These are the unit tests I'd like to see before this is committed:

  • Everything is dandy: Add a scheduled task to run in +3 minutes, ensure both the late and missed return false
  • Late cron jobs: Add a scheduled task to run in -16 minutes, ensure late returns true, missed returns false
  • Missed cron jobs: Add a scheduled task to run in -61 minutes, ensure both late and missed return true
  • No cron jobs: Ensure both late and missed return WP_Error

#30 @peterwilsoncc
12 months ago

  • Keywords has-unit-tests added; needs-unit-tests removed

47223.8.diff is 47223.7.diff but now with unit tests.

#31 @Clorith
12 months ago

  • Keywords commit added; needs-testing removed

47223.8.diff looks solid to me, the added unit tests are nice and thorough as well, thank you for those.

#32 @peterwilsoncc
12 months ago

  • Owner set to peterwilsoncc
  • Resolution set to fixed
  • Status changed from new to closed

In 45801:

Site Health Check: Increase time allowance for cron checks.

Introduces WP_Site_Health::has_late_cron() for late wp-cron jobs and extends the time allowance before a job is considered missed.

In a standard configuration using loopback requests, a job is considered late once past due and missed over five minutes past due.

Late and missed time frames are extended if DISABLE_WP_CRON is defined as true to allow for crontab tasks running less frequently. A job is considered late once it's 15 minutes past due and missed over one hour past due.

A file for site health unit tests has been introduced with tests for cron in critical, late and missed states.

Props rockfire, afragen, peterwilsoncc.
Fixes #47223.

#33 @SergeyBiryukov
12 months ago

In 45803:

Docs: Add missing @since tag for WP_Site_Health::has_late_cron().

See #47223.

#34 @jetxpert
3 months ago

Good Day!

Hope y'all are in good spirits, coding, and enjoying your daily cup of java.

Hey, can the wonder-folks at WordPress provide an update on this? Site Health is still reporting "failed to run" cron messages when a Real Cron is used - or - a plugin's cron job is out of sync with WordPress' Site Health checks.

What is the estimated date for releasing a fix for this issue? (i.e., WordPress Revision Level)

Thank you!

#35 @peterwilsoncc
3 months ago

@jetxpert

This was fixed in version 5.2. On sites using alternative cron, tasks are only considered late if they haven't run within 15 minutes of the scheduled time, and missed if they haven't run within on hour of the scheduled time.

If you're still seeing alerts, it's likely your cron task is failing or you need to run it more frequently. Every 10 minutes is recommended.

#36 @jetxpert
3 months ago

@peterwilsoncc,

Thanks for the quick, stellar reply. Concise and helpful.

Please know that many hosting companies (e.g., SiteGround) are discouraging their clients (especially those on shared plans) from setting up Real Crons less than 30 minutes. Based on your input, therefore, it appears we're going to continue to have these "failed to run" cron messages on our dashboard and other "trigger" issues as well.

With all due respect, I would like to ask the wonder-folks at WordPress consider adjusting the cron timing to 30 minutes -- or -- in a future WP revision, allow users to manually adjust the timing for cron tasks (e.g., in 10 minute increments up to one hour).

Quite frankly, 10 minutes seems too short and may cause undue server load -- with the implication of a temporary host shutdown -- until the cron is adjusted back to 30 minutes or greater.

Last, can you explain again where the 10 minutes came from? Justification? Would like to understand better the rationale.

Again, thank you and have an awesome day!

#37 @DavidAnderson
3 months ago

Use a plugin like WP Crontrol to examine the list of scheduled tasks running on your site, and how often they run. You can then run the maths to see how often you need a cron task to run to service them all on your specific site. It's likely you'll find that 10 minutes is not often enough.

Any hosting company that discourages running one more often than 30 minutes has crossed well over the line into customer-hostile - I say this as someone who's run a hosting company, not as a freeloader who thinks resources don't cost money. If someone's infrastructure can't process the equivalent of an HTTP request every few minutes on a website and you have to ask people to cut down to one HTTP request per half an hour, you're effectively saying either "our platform really stinks and is about to fall over if any websites are used" or "we have no sense of proportion", neither of which is a good sound from someone you're entrusting with the running of a website that matters in any way.

#38 @jetxpert
3 months ago

@DavidAnderson,

You must be on your 3rd cup of coffee :) Thanks for the quick reply.

Click here for a glimpse of what SiteGround (and other hosts) are telling their clients. How would you propose solving this gap?

Again, thank you. Cheers!

Last edited 3 months ago by jetxpert (previous) (diff)

#39 @galbaras
3 months ago

@jetxpert Since WordPress provides a way to run jobs when visitors load pages, why would you settle for once every 30 minutes?

You can run WP CRON from an O/S cron job as a backup, in case you don't get enough traffic, and then, every 30 minutes may be OK.

It's likely that you can use the following workaround: instead of setting up one cron job every 30 minutes, set up 3 identical ones, and run them 10 minutes apart. Basically, if your hosting company is unreasonable, chear ;P

#40 @DavidAnderson
3 months ago

Should WP work overtime to overcome crazy things that hosting companies say or do? Well, to some extent, yes, it's good to make software that "just works" no matter how hostile the environment. But ultimately, you can't square a circle. Two cron jobs per hour, and any non-trivial WordPress site's list of cron jobs that it wants to run, are mathematically incompatible. One plus one cannot be made to equal however-many-dozen.

The kindest intepretation is that Siteground have just followed a piece of popular "folk knowledge" about WP Cron and its supposed performance impact if run externally. But in fact, it has no detectable performance difference compared to when run via the normal mechanism, assuming that your website actually has visitors.

#41 @jetxpert
3 months ago

@galbaras, @DavidAnderson, @peterwilsoncc

Thank you so much. Useful info.

Bottom line is ... we have a choice, no doubt. We can use either WP Cron or Real (Alternate) Cron to optimize a website's responsiveness and/or performance.

We have changed our Real Cron to run every 10 minutes. Lets hope the pesky "fail to run" messages will no longer be triggered and our website will run better. If not, we'll either switch back to WP Cron or adjust our Real Cron based on data obtained via WP Crontrol.

Have a wonderful day.

Cheers!

Note: See TracTickets for help on using tickets.