WordPress.org

Make WordPress Core

Opened 2 months ago

Last modified 7 weeks ago

#49693 new enhancement

Drop duplicate recurring cron events

Reported by: aidvu Owned by:
Milestone: 5.5 Priority: normal
Severity: normal Version: 5.4
Component: Cron API Keywords: has-patch has-unit-tests needs-testing needs-refresh
Focuses: Cc:

Description

It's easy to break a site by mistakenly scheduling millions of events because of a typo or a bad check.

Here's one example: https://wordpress.org/support/topic/error-in-addcustomcronschedule-degrades-site-performance/

In the past, I've also seen plugins use get_option and all kinds of different checks, instead of wp_next_scheduled.

Fixing the site afterwards can be a bit of a PITA, specially for high traffic ones. DB replication is often an issue when the cron option grows in size due to the bug.

It's probably a good idea to prevent scheduling recurring events that have the same hook, schedule and args.

Additional thoughts:

  • This might make the wp_next_scheduled check obsolete when queuing an event?
  • Changing the arguments of wp_schedule_event isn't something I wanted to do, but considering the current code for reschedule/unschedule in wp-cron.php, it seemed the best route (case of missing schedule, and trying to get interval from the event that we're rescheduling).
  • We're currently doing this with a schedule_event filter. Another route could be to add it to default-filters, but I think it should be in the Cron API itself.

Attachments (4)

49693.patch (6.0 KB) - added by aidvu 2 months ago.
patch and tests
49693-2.patch (6.8 KB) - added by aidvu 2 months ago.
Patch + tests for wp_reschedule_event.
49693-3.patch (6.8 KB) - added by aidvu 2 months ago.
Comment updates.
49693-alternative.patch (5.8 KB) - added by aidvu 2 months ago.
Alternative approach with wp_doing_cron() check.

Download all attachments as: .zip

Change History (17)

@aidvu
2 months ago

patch and tests

#1 @aidvu
2 months ago

Another route would be to add the reschedule flag to the $args when coming from wp_reschedule_event, and check that + unset it when saving the event. I don't like it. :)

Last edited 2 months ago by aidvu (previous) (diff)

#2 in reply to: ↑ description @aidvu
2 months ago

  • We're currently doing this with a schedule_event filter. Another route could be to add it to default-filters, but I think it should be in the Cron API itself.

Scratch this. It breaks wp_reschedule_event. Most plugins will requeue their own recurring events, but this isn't possible now, because of how wp-cron.php processes events.

@aidvu
2 months ago

Patch + tests for wp_reschedule_event.

@aidvu
2 months ago

Comment updates.

#3 @aidvu
2 months ago

Alternative approach would be to skip the duplicate check when we're wp_doing_cron(). This would make the wp_reschedule_event() function unusable if we're not DOING_CRON.

@aidvu
2 months ago

Alternative approach with wp_doing_cron() check.

#4 @azaozz
2 months ago

  • Keywords needs-testing added
  • Milestone changed from Awaiting Review to 5.5

At first look 49693-alternative.patch seems better as it doesn't need additional param. But breaking wp_reschedule_event() when not doing wp_cron doesn't seem like a good idea. Thinking the approach in 49693-3.patch is the right one.

The only thing left here is perhaps to try to figure out if any plugins will be negatively affected by this change. As far as I see there are a handful of plugin that call wp_reschedule_event(): https://wpdirectory.net/search/01E4CJVDEWK1JRTHRH7CVJGSZF.

#5 @peterwilsoncc
2 months ago

I have a couple of concerns here

Rather than adding new arguments I'd like to switch to an $args array. There's a couple of interactions in reported bugs that could be problematic with additional arguments. Refer https://core.trac.wordpress.org/ticket/34913 and https://core.trac.wordpress.org/ticket/35491#comment:13

I'm not sure preventing multiple identically scheduled events is ideal. I can think of legitimate reasons for running the same hook & args combination twice.

The first that comes to mind is scheduling a task to run at specific times of day; for example weather forecasts in my home town are released at scheduled but irregular times throughout the day. A large task might be set to run twice overnight to allow it to be split (but this is a less likely circumstance).

As far as I see there are a handful of plugin that call wp_reschedule_event()

It's mainly intended as an internal function but the first few plugins hit 6,000,000 active installs between them so we should account for them.

I agree if not called correctly it is easy to blow up the cron array fairly quickly. WordPress does try to defend against developer mistakes but I'm not sure this is a circumstance in which it can be easily done without hindering the intent of more common use cases.

#6 follow-up: @aidvu
2 months ago

Thank you for the feedback @azaozz and @peterwilsoncc! 🙇

Rather than adding new arguments I'd like to switch to an $args array.

Is this the same thing I suggested here: https://core.trac.wordpress.org/ticket/49693#comment:1 ?

I'm not sure preventing multiple identically scheduled events is ideal. I can think of legitimate reasons for running the same hook & args combination twice.

Isn't this the reason why we have custom intervals: https://developer.wordpress.org/plugins/cron/understanding-wp-cron-scheduling/

It might be easier to schedule duplicates with different start times, but I think the right way for such schedules would be a custom interval.

The only thing left here is perhaps to try to figure out if any plugins will be negatively affected by this change. As far as I see there are a handful of plugin that call wp_reschedule_event()

Hmm, wp_reschedule_event() is only affected in case of the alternative.patch. Since we're not going with that one, behavior stays unchanged (except what was proposed in the patch).

Rather than adding new arguments I'd like to switch to an $args array.

Just to be sure, you want to pass the rescheduling param as part of the $args to the functions?

e.g.

  • When calling wp_reschedule_event(), we set a special rescheduling param in $args. I'd go with wp_cron_$hook_rescheduling, to avoid collisions.
  • In wp_schedule_event() we look at this param, and unset it if set (so the $args match the original jobs args)
  • We use the variable to decide whether to skip the duplicate check or not.

#7 in reply to: ↑ 6 @peterwilsoncc
2 months ago

Replying to aidvu:

Isn't this the reason why we have custom intervals: https://developer.wordpress.org/plugins/cron/understanding-wp-cron-scheduling/

It might be easier to schedule duplicates with different start times, but I think the right way for such schedules would be a custom interval.

To be clearer, I was thinking of the same event that runs periodically but at asymmetrical intervals. To use the weather example, a news site downloading latest forecast for my home town might use two daily events with the hook 49693_bomau_weather_update [ [ 'product' => 'IDV10450' ] ] running at 5:15am and 4:25pm which is when the local reports are updated.

To protect against blowing up the table for single events, WP prevents identical events being scheduled within ten minutes of each other. I'm happy enough with doing something like that but the maths would be a bit messy. It could also be difficult for plugins using custom cron storage with the hooks added in WP 5.1

As some rough (ie, not completely tested) code:

<?php

$now = time();
$next = time() + 300; // 5 minutes from now.
$later = time() + ( 60 * 60 * 24 ) + 360 ;// 1 day, 6 minutes.
$even_later = time() + 60 * 60 * 24 * 1.5; // 36 hours.

$interval = 60 * 60 * 24; // One day.

$now_offset = $now % $interval;
$next_offset = $next % $interval;
$later_offset = $later % $interval;
$even_later_offset = $even_later % $interval;

var_dump ( [
    'now_offset' => $now_offset,
    'next_offset' => $next_offset,
    'later_offset' => $later_offset,
    'even_later_offset' => $even_later_offset,
    'next_run' => abs( $now_offset - $next_offset ) < 600 ? 'disallow' : 'allow',
    'later_run' => abs( $now_offset - $later_offset ) < 600 ? 'disallow' : 'allow',
    'even_later_run' => abs( $now_offset - $even_later_offset ) < 600 ? 'disallow' : 'allow',
] );
array(7) {
  ["now_offset"]=>
  int(77695)
  ["next_offset"]=>
  int(77995)
  ["later_offset"]=>
  int(78055)
  ["even_later_offset"]=>
  int(34495)
  ["next_run"]=>
  string(8) "disallow"
  ["later_run"]=>
  string(8) "disallow"
  ["even_later_run"]=>
  string(5) "allow"
}

#8 @azaozz
2 months ago

To protect against blowing up the table for single events, WP prevents identical events being scheduled within ten minutes of each other. I'm happy enough with doing something like that but the maths would be a bit messy.

Right, as WP has throttling down for single events was thinking it needs something similar for multi-events.

Do you think it may be possible to base it on number of scheduled events? Like, perhaps look at how many are scheduled for the next 12 or 24 hours, etc.

#9 @aidvu
2 months ago

To protect against blowing up the table for single events, WP prevents identical events being scheduled within ten minutes of each other. I'm happy enough with doing something like that but the maths would be a bit messy. It could also be difficult for plugins using custom cron storage with the hooks added in WP 5.1

This is something I first thought of doing, but it felt more convoluted than an explicit rule. I also think that this wouldn't fix the problem but only hide it, which is then troublesome for debugging.

To be clearer, I was thinking of the same event that runs periodically but at asymmetrical intervals. To use the weather example, a news site downloading latest forecast for my home town might use two daily events with the hook 49693_bomau_weather_update [ [ 'product' => 'IDV10450' ] ] running at 5:15am and 4:25pm which is when the local reports are updated.

This is what I meant by custom intervals. But then again, I can't argue that it's not convenient to use the same event twice at different times (but it does feel wrong to me).


All that said, I can implement whatever we agree on, but my vote is for a more explicit no-duplicates rule.

#10 follow-up: @aidvu
8 weeks ago

@azaozz / @peterwilsoncc any additional info on how you want to proceed with this?

#11 in reply to: ↑ 10 @peterwilsoncc
7 weeks ago

  • Keywords needs-refresh added

Replying to aidvu:

@azaozz / @peterwilsoncc any additional info on how you want to proceed with this?

Sorry for the delay.

Let's go with @azaozz suggestion of n max per day.

An event is considered identical if it has the same interval, schedule name, hook and arguments.

My inclination is to be generous with the maximum allowed within the day as the goal is to avoid explosions without hindering the user/developer. Proposal: 24

In terms of patch, the timestamp allowance will be similar to the ten minute calculation for single events: ie, min and max time stamp may be greater than 24 hours apart depending on when the event is being scheduled.

#12 @azaozz
7 weeks ago

My inclination is to be generous with the maximum allowed within the day as the goal is to avoid explosions without hindering the user/developer.

Yes, same here. The purpose is to do a sanity check, not to limit usage (the 10 min throttling when adding single events can also be seen as "maximum of 144 duplicates per day" (24 x 60 / 10)). Maximum of 24 multi-event duplicates per day sounds good imho.

Last edited 7 weeks ago by azaozz (previous) (diff)

#13 @aidvu
7 weeks ago

I'm inclined to say 42. :D

In terms of patch, the timestamp allowance will be similar to the ten minute calculation for single events: ie, min and max time stamp may be greater than 24 hours apart depending on when the event is being scheduled.

Thinking about this, I'm not sure a fixed number is what we need... Let's say, someone creates a custom event that runs every 5 seconds (like the example on wp.org docs). Should probably be based on the interval, with a sane "maximum" for events that run fewer times.

Not sure about the same calculation as for single-events. I had this in mind.

        // Always allow 24 occurrences per interval, even if it's an event with long $interval.
        $allowed_occurrences = 24;
        if ( 24 < DAY_IN_SECONDS / $interval ) {
                // Set a per-day limit based on the interval
                $allowed_occurrences = DAY_IN_SECONDS / $interval;
        }
        $check_interval = DAY_IN_SECONDS;
        if ( $check_interval < $event->interval ) {
                $check_interval = $event->interval;
        }

        $min_timestamp = $event->timestamp - ( $check_interval / 2 );
        $max_timestamp = $event->timestamp + ( $check_interval / 2 );
        $occurrences = 0;

        foreach ( $crons as $event_timestamp => $cron ) {
                if ( $event_timestamp < $min_timestamp ) {
                        continue;
                }
                if ( $event_timestamp > $max_timestamp ) {
                        break;
                }
                if ( isset( $cron[ $event->hook ][ $key ] ) ) {
                        $occurrences++;
                        break;
                }
        }

        if ( $allowed_occurrences < $occurrences ) {
                return false;
        }
Note: See TracTickets for help on using tickets.