Make WordPress Core

Opened 4 years ago

Last modified 3 years ago

#49693 accepted enhancement

Drop duplicate recurring cron events

Reported by: aidvu's profile aidvu Owned by: whyisjake's profile whyisjake
Milestone: Future Release Priority: normal
Severity: normal Version: 5.4
Component: Cron API Keywords: has-patch has-unit-tests needs-dev-note needs-refresh
Focuses: Cc:

Description

It's easy to break a site by mistakenly scheduling millions of events because of a typo or a bad check.

Here's one example: https://wordpress.org/support/topic/error-in-addcustomcronschedule-degrades-site-performance/

In the past, I've also seen plugins use get_option and all kinds of different checks, instead of wp_next_scheduled.

Fixing the site afterwards can be a bit of a PITA, specially for high traffic ones. DB replication is often an issue when the cron option grows in size due to the bug.

It's probably a good idea to prevent scheduling recurring events that have the same hook, schedule and args.

Additional thoughts:

  • This might make the wp_next_scheduled check obsolete when queuing an event?
  • Changing the arguments of wp_schedule_event isn't something I wanted to do, but considering the current code for reschedule/unschedule in wp-cron.php, it seemed the best route (case of missing schedule, and trying to get interval from the event that we're rescheduling).
  • We're currently doing this with a schedule_event filter. Another route could be to add it to default-filters, but I think it should be in the Cron API itself.

Attachments (6)

49693.patch (6.0 KB) - added by aidvu 4 years ago.
patch and tests
49693-2.patch (6.8 KB) - added by aidvu 4 years ago.
Patch + tests for wp_reschedule_event.
49693-3.patch (6.8 KB) - added by aidvu 4 years ago.
Comment updates.
49693-alternative.patch (5.8 KB) - added by aidvu 4 years ago.
Alternative approach with wp_doing_cron() check.
49693-4.patch (7.2 KB) - added by aidvu 4 years ago.
Approach with time based limiter for recurring events.
49693-5.patch (7.6 KB) - added by aidvu 4 years ago.
Better approach, that also correctly limits to 24 hours per day (for intervals less than a day), and to INTERVAL / HOUR_IN_SECONDS for longer intervals.

Download all attachments as: .zip

Change History (38)

@aidvu
4 years ago

patch and tests

#1 @aidvu
4 years ago

Another route would be to add the reschedule flag to the $args when coming from wp_reschedule_event, and check that + unset it when saving the event. I don't like it. :)

Last edited 4 years ago by aidvu (previous) (diff)

#2 in reply to: ↑ description @aidvu
4 years ago

  • We're currently doing this with a schedule_event filter. Another route could be to add it to default-filters, but I think it should be in the Cron API itself.

Scratch this. It breaks wp_reschedule_event. Most plugins will requeue their own recurring events, but this isn't possible now, because of how wp-cron.php processes events.

@aidvu
4 years ago

Patch + tests for wp_reschedule_event.

@aidvu
4 years ago

Comment updates.

#3 @aidvu
4 years ago

Alternative approach would be to skip the duplicate check when we're wp_doing_cron(). This would make the wp_reschedule_event() function unusable if we're not DOING_CRON.

@aidvu
4 years ago

Alternative approach with wp_doing_cron() check.

#4 @azaozz
4 years ago

  • Keywords needs-testing added
  • Milestone changed from Awaiting Review to 5.5

At first look 49693-alternative.patch seems better as it doesn't need additional param. But breaking wp_reschedule_event() when not doing wp_cron doesn't seem like a good idea. Thinking the approach in 49693-3.patch is the right one.

The only thing left here is perhaps to try to figure out if any plugins will be negatively affected by this change. As far as I see there are a handful of plugin that call wp_reschedule_event(): https://wpdirectory.net/search/01E4CJVDEWK1JRTHRH7CVJGSZF.

#5 @peterwilsoncc
4 years ago

I have a couple of concerns here

Rather than adding new arguments I'd like to switch to an $args array. There's a couple of interactions in reported bugs that could be problematic with additional arguments. Refer https://core.trac.wordpress.org/ticket/34913 and https://core.trac.wordpress.org/ticket/35491#comment:13

I'm not sure preventing multiple identically scheduled events is ideal. I can think of legitimate reasons for running the same hook & args combination twice.

The first that comes to mind is scheduling a task to run at specific times of day; for example weather forecasts in my home town are released at scheduled but irregular times throughout the day. A large task might be set to run twice overnight to allow it to be split (but this is a less likely circumstance).

As far as I see there are a handful of plugin that call wp_reschedule_event()

It's mainly intended as an internal function but the first few plugins hit 6,000,000 active installs between them so we should account for them.

I agree if not called correctly it is easy to blow up the cron array fairly quickly. WordPress does try to defend against developer mistakes but I'm not sure this is a circumstance in which it can be easily done without hindering the intent of more common use cases.

#6 follow-up: @aidvu
4 years ago

Thank you for the feedback @azaozz and @peterwilsoncc! 🙇

Rather than adding new arguments I'd like to switch to an $args array.

Is this the same thing I suggested here: https://core.trac.wordpress.org/ticket/49693#comment:1 ?

I'm not sure preventing multiple identically scheduled events is ideal. I can think of legitimate reasons for running the same hook & args combination twice.

Isn't this the reason why we have custom intervals: https://developer.wordpress.org/plugins/cron/understanding-wp-cron-scheduling/

It might be easier to schedule duplicates with different start times, but I think the right way for such schedules would be a custom interval.

The only thing left here is perhaps to try to figure out if any plugins will be negatively affected by this change. As far as I see there are a handful of plugin that call wp_reschedule_event()

Hmm, wp_reschedule_event() is only affected in case of the alternative.patch. Since we're not going with that one, behavior stays unchanged (except what was proposed in the patch).

Rather than adding new arguments I'd like to switch to an $args array.

Just to be sure, you want to pass the rescheduling param as part of the $args to the functions?

e.g.

  • When calling wp_reschedule_event(), we set a special rescheduling param in $args. I'd go with wp_cron_$hook_rescheduling, to avoid collisions.
  • In wp_schedule_event() we look at this param, and unset it if set (so the $args match the original jobs args)
  • We use the variable to decide whether to skip the duplicate check or not.

#7 in reply to: ↑ 6 @peterwilsoncc
4 years ago

Replying to aidvu:

Isn't this the reason why we have custom intervals: https://developer.wordpress.org/plugins/cron/understanding-wp-cron-scheduling/

It might be easier to schedule duplicates with different start times, but I think the right way for such schedules would be a custom interval.

To be clearer, I was thinking of the same event that runs periodically but at asymmetrical intervals. To use the weather example, a news site downloading latest forecast for my home town might use two daily events with the hook 49693_bomau_weather_update [ [ 'product' => 'IDV10450' ] ] running at 5:15am and 4:25pm which is when the local reports are updated.

To protect against blowing up the table for single events, WP prevents identical events being scheduled within ten minutes of each other. I'm happy enough with doing something like that but the maths would be a bit messy. It could also be difficult for plugins using custom cron storage with the hooks added in WP 5.1

As some rough (ie, not completely tested) code:

<?php

$now = time();
$next = time() + 300; // 5 minutes from now.
$later = time() + ( 60 * 60 * 24 ) + 360 ;// 1 day, 6 minutes.
$even_later = time() + 60 * 60 * 24 * 1.5; // 36 hours.

$interval = 60 * 60 * 24; // One day.

$now_offset = $now % $interval;
$next_offset = $next % $interval;
$later_offset = $later % $interval;
$even_later_offset = $even_later % $interval;

var_dump ( [
    'now_offset' => $now_offset,
    'next_offset' => $next_offset,
    'later_offset' => $later_offset,
    'even_later_offset' => $even_later_offset,
    'next_run' => abs( $now_offset - $next_offset ) < 600 ? 'disallow' : 'allow',
    'later_run' => abs( $now_offset - $later_offset ) < 600 ? 'disallow' : 'allow',
    'even_later_run' => abs( $now_offset - $even_later_offset ) < 600 ? 'disallow' : 'allow',
] );
array(7) {
  ["now_offset"]=>
  int(77695)
  ["next_offset"]=>
  int(77995)
  ["later_offset"]=>
  int(78055)
  ["even_later_offset"]=>
  int(34495)
  ["next_run"]=>
  string(8) "disallow"
  ["later_run"]=>
  string(8) "disallow"
  ["even_later_run"]=>
  string(5) "allow"
}

#8 @azaozz
4 years ago

To protect against blowing up the table for single events, WP prevents identical events being scheduled within ten minutes of each other. I'm happy enough with doing something like that but the maths would be a bit messy.

Right, as WP has throttling down for single events was thinking it needs something similar for multi-events.

Do you think it may be possible to base it on number of scheduled events? Like, perhaps look at how many are scheduled for the next 12 or 24 hours, etc.

#9 @aidvu
4 years ago

To protect against blowing up the table for single events, WP prevents identical events being scheduled within ten minutes of each other. I'm happy enough with doing something like that but the maths would be a bit messy. It could also be difficult for plugins using custom cron storage with the hooks added in WP 5.1

This is something I first thought of doing, but it felt more convoluted than an explicit rule. I also think that this wouldn't fix the problem but only hide it, which is then troublesome for debugging.

To be clearer, I was thinking of the same event that runs periodically but at asymmetrical intervals. To use the weather example, a news site downloading latest forecast for my home town might use two daily events with the hook 49693_bomau_weather_update [ [ 'product' => 'IDV10450' ] ] running at 5:15am and 4:25pm which is when the local reports are updated.

This is what I meant by custom intervals. But then again, I can't argue that it's not convenient to use the same event twice at different times (but it does feel wrong to me).


All that said, I can implement whatever we agree on, but my vote is for a more explicit no-duplicates rule.

#10 follow-up: @aidvu
4 years ago

@azaozz / @peterwilsoncc any additional info on how you want to proceed with this?

#11 in reply to: ↑ 10 @peterwilsoncc
4 years ago

  • Keywords needs-refresh added

Replying to aidvu:

@azaozz / @peterwilsoncc any additional info on how you want to proceed with this?

Sorry for the delay.

Let's go with @azaozz suggestion of n max per day.

An event is considered identical if it has the same interval, schedule name, hook and arguments.

My inclination is to be generous with the maximum allowed within the day as the goal is to avoid explosions without hindering the user/developer. Proposal: 24

In terms of patch, the timestamp allowance will be similar to the ten minute calculation for single events: ie, min and max time stamp may be greater than 24 hours apart depending on when the event is being scheduled.

#12 @azaozz
4 years ago

My inclination is to be generous with the maximum allowed within the day as the goal is to avoid explosions without hindering the user/developer.

Yes, same here. The purpose is to do a sanity check, not to limit usage (the 10 min throttling when adding single events can also be seen as "maximum of 144 duplicates per day" (24 x 60 / 10)). Maximum of 24 multi-event duplicates per day sounds good imho.

Last edited 4 years ago by azaozz (previous) (diff)

#13 @aidvu
4 years ago

I'm inclined to say 42. :D

In terms of patch, the timestamp allowance will be similar to the ten minute calculation for single events: ie, min and max time stamp may be greater than 24 hours apart depending on when the event is being scheduled.

Thinking about this, I'm not sure a fixed number is what we need... Let's say, someone creates a custom event that runs every 5 seconds (like the example on wp.org docs). Should probably be based on the interval, with a sane "maximum" for events that run fewer times.

Not sure about the same calculation as for single-events. I had this in mind.

        // Always allow 24 occurrences per interval, even if it's an event with long $interval.
        $allowed_occurrences = 24;
        if ( 24 < DAY_IN_SECONDS / $interval ) {
                // Set a per-day limit based on the interval
                $allowed_occurrences = DAY_IN_SECONDS / $interval;
        }
        $check_interval = DAY_IN_SECONDS;
        if ( $check_interval < $event->interval ) {
                $check_interval = $event->interval;
        }

        $min_timestamp = $event->timestamp - ( $check_interval / 2 );
        $max_timestamp = $event->timestamp + ( $check_interval / 2 );
        $occurrences = 0;

        foreach ( $crons as $event_timestamp => $cron ) {
                if ( $event_timestamp < $min_timestamp ) {
                        continue;
                }
                if ( $event_timestamp > $max_timestamp ) {
                        break;
                }
                if ( isset( $cron[ $event->hook ][ $key ] ) ) {
                        $occurrences++;
                        break;
                }
        }

        if ( $allowed_occurrences < $occurrences ) {
                return false;
        }

This ticket was mentioned in Slack in #core by david.baumwald. View the logs.


4 years ago

#15 @whyisjake
4 years ago

  • Owner set to whyisjake
  • Status changed from new to accepted

@aidvu, it seems like there is a consensus to add a daily max. Can you update the patch to reflect that change? I'm thinking this needs a dev note too for 5.5.

#16 @desrosj
4 years ago

  • Keywords needs-dev-note added

@aidvu Are you able to refresh the patch? A patch is needed in the next few days in order for this to have a chance of making the cut for 5.5.

#17 @aidvu
4 years ago

Was waiting for feedback on the code provided in https://core.trac.wordpress.org/ticket/49693#comment:13. Should be able to create a new patch with tests in the next 2 days.

@aidvu
4 years ago

Approach with time based limiter for recurring events.

#18 @aidvu
4 years ago

  • Keywords needs-refresh removed

Added a new patch + tests with the approach proposed here: https://core.trac.wordpress.org/ticket/49693#comment:13

cc @desrosj / @whyisjake

@aidvu
4 years ago

Better approach, that also correctly limits to 24 hours per day (for intervals less than a day), and to INTERVAL / HOUR_IN_SECONDS for longer intervals.

#19 @aidvu
4 years ago

Had a long night thought, and I think the 49693-5.patch is the correct approach for limiting per day.

The current limit for duplicate recurring events per day is 24 DAY_IN_SECONDS / HOUR_IN_SECONDS, while for longer intervals it is floor( EVENT_INTERVAL / HOUR_IN_SECONDS ). Duplicate here means: same hook, args, schedule and interval.

Tests cover less than day and more than day intervals (although because of the longer than day interval one might run for >150ms).

#20 @peterwilsoncc
4 years ago

  • Keywords needs-refresh added
  • Milestone changed from 5.5 to 5.6

I'm going to bump this from 5.5 as I think it needs a few things before it's ready to add to Core, and I'd rather not rush it for next week given Cron's importance for auto-updates, scheduled posts, etc.

A few notes on the proposed patch:

  • the current timing test is incomplete, a daily event can be added 500 times if you start 500 days out and work back to tomorrow. If this is to be done, I'd rather do it properly by using the interval to determine a remainder, I recall some issues with single events due the the imprecise logic it had until recently
  • min_timestamp and max_timestamp will need the additional calculations for events scheduled around time(), see the other cron functions for these.
  • Create a filter for $allowed_occurrences and pass the event details
  • $allowed_occurrences === true (or another such value) to allow for unlimited occurrences if a plugin filters for it.
  • In the tests, rather than $max_events = DAY_IN_SECONDS / HOUR_IN_SECONDS;, etc, put in the expected values (24, etc) -- otherwise you're testing the code by running the code, which will always pass.
  • Additional tests will be needed to ensure the same hook with different arguments or intervals are not considered duplicates

#21 @aidvu
4 years ago

the current timing test is incomplete, a daily event can be added 500 times if you start 500 days out and work back to tomorrow. If this is to be done, I'd rather do it properly by using the interval to determine a remainder, I recall some issues with single events due the the imprecise logic it had until recently

This is true. I went with the idea from this comment: https://core.trac.wordpress.org/ticket/49693#comment:12. Boils down to, make a safeguard so that situations that usually occur (adding events on every pageload because of a typo in the event name wouldn't break WP). Also, even if you did manage to add 500 duplicate daily events, the reschedule would drop them.

I also wanted to keep the logic simple. In a 24 hour interval, max duplicate recurring events is 24.

min_timestamp and max_timestamp will need the additional calculations for events scheduled around time(), see the other cron functions for these.

Can check.

Create a filter for $allowed_occurrences and pass the event details
$allowed_occurrences === true (or another such value) to allow for unlimited occurrences if a plugin filters for it.

Makes sense.

In the tests, rather than $max_events = DAY_IN_SECONDS / HOUR_IN_SECONDS;, etc, put in the expected values (24, etc) -- otherwise you're testing the code by running the code, which will always pass.

Kk.

Additional tests will be needed to ensure the same hook with different arguments or intervals are not considered duplicates

Sure, like filling in the max occurrences and then switching the params.

#22 @peterwilsoncc
4 years ago

This is true. I went with the idea from this comment: https://core.trac.wordpress.org/ticket/49693#comment:12. Boils down to, make a safeguard so that situations that usually occur (adding events on every pageload because of a typo in the event name wouldn't break WP). Also, even if you did manage to add 500 duplicate daily events, the reschedule would drop them.

I also wanted to keep the logic simple. In a 24 hour interval, max duplicate recurring events is 24.

The issue I have with this approach is that it basically introduces a known bug to the logic check. The 10 minute window for single scheduled events had a bug for years and was causing problems.

I'm still not convinced this is needed in core, but if it's going to be done, it needs to be done properly.

FWIW, to avoid the plugin issue linked above, I would rather create a wrapper function to schedule an event only if it's not scheduled.

function wp_maybe_schedule_event() {
  if ( wp_next_scheduled() ) {
    return false;
  }

  return wp_schedule_event();
}

#23 follow-up: @aidvu
4 years ago

The issue I have with this approach is that it basically introduces a known bug to the logic check. The 10 minute window for single scheduled events had a bug for years and was causing problems.

Is this commit what you're referring to?

I'm still not convinced this is needed in core, but if it's going to be done, it needs to be done properly.

It's been a while, and at this point I'm not sure what the done properly would be in this case. Maybe what you suggested here?

FWIW, to avoid the plugin issue linked above, I would rather create a wrapper function to schedule an event only if it's not scheduled.

Not against it. Would need a docs update and is probably good enough to avoid most problems although adoption will take time.

#24 @aidvu
4 years ago

@peterwilsoncc Could you check my question above, so I can go ahead and update the patches with the requested calculations?

#25 in reply to: ↑ 23 @peterwilsoncc
4 years ago

Replying to aidvu:

The issue I have with this approach is that it basically introduces a known bug to the logic check. The 10 minute window for single scheduled events had a bug for years and was causing problems.

Is this commit what you're referring to?

Yes, that's the commit I am referring to.

I'm still not convinced this is needed in core, but if it's going to be done, it needs to be done properly.

It's been a while, and at this point I'm not sure what the done properly would be in this case. Maybe what you suggested here?

By done properly, I mean not allowing for edge case where the same event can be run more than the expected number of times in a period. IE, the same recurring event on a 24 hour schedule to be set up more than the allowed number of times in the period. That means accounting for past scheduled times and including all events in the count (ie, never break out of the loop counting events).

FWIW, to avoid the plugin issue linked above, I would rather create a wrapper function to schedule an event only if it's not scheduled.

Not against it. Would need a docs update and is probably good enough to avoid most problems although adoption will take time.

That's what I am thinking, the docs are inline so can be updated as part of the code changes. A Use wp_maybe_schedule** to only ensure events are not re-registered type of comment should do.

#26 @aidvu
4 years ago

Thanks for the reply @peterwilsoncc. Fwiw, I'd probably go for both.

A wp_maybe_schedule() as the new function to properly ensure a single recurring event is registered, and a limiter for recurring events which conforms to the requirements you mentioned above.

Will try to get both done in the coming weeks.

This ticket was mentioned in Slack in #core by monikarao. View the logs.


3 years ago

#28 @johnbillion
3 years ago

  • Keywords needs-testing removed

@aidvu Thanks for your work on this so far. Were you able to find time to work on those changes? Let us know so we can adjust the milestone of this ticket if necessary.

#29 @aidvu
3 years ago

@johnbillion think I can make it for 5.6. Thanks for pinging.

#30 @hellofromTonya
3 years ago

  • Milestone changed from 5.6 to Future Release

Spoke with @aidvu in Slack today. He doesn't have capacity before Beta 1. Moving this ticket to Future Release.

If any maintainer or committer feels this can be resolved in time, or wishes to assume ownership during a specific cycle, feel free to update the milestone accordingly.

This ticket was mentioned in Slack in #core by azaozz. View the logs.


3 years ago

#32 @johnbillion
3 years ago

@rmccue What does Cavalcade do in this situation? Does it ignore duplicate recurring events like core or does it schedule them anyway?

@betzster What about Cron Control?

Note: See TracTickets for help on using tickets.