Make WordPress Core

Opened 14 years ago

Closed 6 years ago

Last modified 4 years ago

#15148 closed enhancement (wontfix)

Cron Storage Abstraction

Reported by: ryan's profile ryan Owned by:
Milestone: Priority: normal
Severity: normal Version:
Component: Cron API Keywords: has-patch dev-feedback
Focuses: Cc:

Description

Abstract cron storage to allow pluggable storage schemes.

Attachments (4)

15148.diff (4.7 KB) - added by ryan 14 years ago.
15148.2.diff (12.2 KB) - added by ryan 14 years ago.
WP_Cron_Store
18084.patch (12.3 KB) - added by mintindeed 13 years ago.
Updated WP_Cron_Store
cron.php (8.9 KB) - added by mintindeed 13 years ago.
Proposed improved cron store (as a drop-in)

Download all attachments as: .zip

Change History (31)

@ryan
14 years ago

#1 @westi
14 years ago

  • Cc westi added

@ryan
14 years ago

WP_Cron_Store

#2 @ryan
14 years ago

Moved to WP_Cron_Store class that can be inherited by a custom class or overridden. It is loaded in the same fashion as wpdb. I started by using a custom class filter as is done with the xmlrpc server class and other classes, but core WP (and thus probably many plugins) was calling cron functions prior to init. I moved the core places that were doing this into a function hooked onto init. If we are willing to suffer a back compat hit, we could move to the class filter approach which has the advantage of not requiring files living directly in wp-content that aren't part of auto updates.

#3 @markjaquith
14 years ago

  • Milestone changed from Awaiting Review to Future Release

@mintindeed
13 years ago

Updated WP_Cron_Store

#4 @mintindeed
13 years ago

I've attached an updated version of Ryan's patch. I've tested it locally and fixed a few minor bugs, but haven't tested it in production yet.

I'm also attaching a cron.php drop-in that would use a separate cron job table. This drop-in won't handle concurrent wp cron processes any better than the current wp cron, but it will handle prevention of duplicate jobs and prevent race conditions that cause jobs to get lost. Concurrent instances of wp cron could be minimized by updating executing jobs to the "processing" state and checking the job state prior to executing. They could be completely eliminated by spawning 1 cron process to handle each queued task, but I didn't want to get too ambitious without feedback.

Some of the modifications include:

  • Private methods are handled entirely within the cron store class, cron API no longer uses private methods but instead passes arguments to the cron store class
  • Changed WP API to pass functions directly to the cron store class. The cron store can figure out how it wants to handle those params.
  • Ryan's previous patch removed the procedural versions of _get_cron_array, _set_cron_array and _upgrade_cron_array, this patch takes the next step and actually labels them as protected methods. Not having a publically-accessible version of these methods may break some plugins that used them in spite of being labelled with @access private. If that's the case, may need to make the methods public and create a public procedural function that also invokes doing_it_wrong and _deprecated.
  • I wonder if it's cheaper for *_Cron_Store->get_event() to first check if the event exists, if not then immediately query the event from the database rather than looping through the current cron array?
  • Can't bail out of *_Cron_Store->get_event() early if _cron_array is empty, because there may be events stored in the database. Don't want to query all pending events because of large job queues, e.g. Disqus on sites with a lot of comments.
  • If using the PMC_Cron_Store class, wp-cron.php could be improved by updating in-progress jobs to "processing" status. Then it would be possible to go back and see what jobs failed/never completed, and potentially have an automated process that would retry them and/or alert the admin that the jobs never completed. It also paves the way for having wp cron spawn 1 process to handle each pending job it tries to process, rather than trying to procedurally complete jobs. This would theoretically negate the need for the "doing cron" transient.

#5 @mintindeed
13 years ago

  • Cc gabriel.koen@… added

@mintindeed
13 years ago

Proposed improved cron store (as a drop-in)

#6 @mintindeed
13 years ago

  • Keywords has-patch added

Oops, the cron.php drop-in I originally attached depended on a function only available in upgrade.php. Fixed that.

#7 @agupta_pmc
13 years ago

  • Cc agupta_pmc added

#8 @chriscct7
10 years ago

  • Keywords dev-feedback added

#9 @archon810
10 years ago

  • Severity changed from normal to major

Hello gents,

Any movement here? Let me tell you a wonderful tale that made me stay up till 8am while the site and all servers were on fire, all due to cron, as I found out eventually. It proved once again that using a single serialized array in wp_options is a terrible, horrible, evil idea.

The sequence of events was as follows.

  1. On a busy site (15+mln pageviews a month), cron is used relatively extensively, but within reason.
  2. Heavy load leads to some db performance issues.
  3. A sneaky bug in one of our plugins results in a single cron entry being scheduled by probably 10% of the requests. Usually, the only side effect would be that this cron entry would stick around in the cron job list because it gets reinserted over and over.
  4. These db performance issues result in both slow writes and reads, which in turn means that due to a race condition, wp-cron gets read and written over and over, overwriting anything the jobs that are in progress and potentially even already complete and putting them back in. It's easy to see how that turns into a nightmare. But it gets worse. Btw, I have to mention that due to cron performance issues in the past, I've long switched off the built-in wp cron and enabled a Linux cron-initiated wp-cron.
  5. The heightened load and constant cron overwriting grows the cron job list to about 450 jobs that have no chance to get flushed, at least that was the state I found it in. At this point, just the cron array is 1.8MB, and the servers are burning way up.
  6. Did I mention how things manage to get more fun? The db server is actually a master in a master-slave configuration, and it's starting to spit out binlogs like it's nobody's business. I'm talking a gig per minute at its worst. Network is burning up too, but... then the master runs out of space after eating through 100GB of space in a matter of 2-3 hours.
  7. Oh, and additionally, the huge increases in cron array size (i.e. the call to get_options()) ends up in tons of OOM errors in the database, at which point Wordpress does wonderful things like considering the query result as returning empty. At that point, the site may switch to the "Hiii, it's my first install, here's an install page where you can create an admin user" message, various plugin settings get reset, and other fun stuff. It's really really fun. I can't recommend enough trying this once, but don't forget to take some speed first.
  8. At this point replication breaks, but the slave remains online, which was the only saving grace and together with W3 Total Cache kept the site mostly up, as long as pages remained in the cache.

It took me many hours to finally figure out what's going on, fix the cron bug, delete the ballooned cron job list, re-initiate replication, restore plugin settings from a backup, and watch the sun rise.

Yes, there was a bug that was writing a cron entry too frequently. But if the cron system in WP used a dedicated table, none of this would have happened. It would be way more robust and not easily broken by race conditions.

I sincerely hope this ticket gets some action for a release this year. It would seriously make my day/month/year.

Thanks for reading.

Last edited 10 years ago by archon810 (previous) (diff)

#11 @archon810
8 years ago

Come on, people. The fact that the whole cron is serialized in a single entry is asking for trouble. It just bit me again, and many others https://wordpress.org/support/topic/high-server-load-and-dsq_sync_forum-problem?replies=16.

It destroys sites.

Let's get a proper table for cron that doesn't have the issues a single entry does and resolve this bug for good!

#12 @archon810
8 years ago

  • Severity changed from major to critical

I'm updating severity to critical because this bug destroys sites and is very hard to diagnose and mitigate once it happens (because a) you need to know where to look, b) it can bring down replication and run db servers out of space, and c) once you try to delete the cron entry, it'll just be written again by every PHP thread currently in memory).

#13 @archon810
8 years ago

I filed a new ticket with Disqus here explaining the problem in the meantime. I've also pinged my Disqus contact. Hopefully, they'll take it seriously and fix it soon https://github.com/disqus/disqus-wordpress/issues/113.

#14 @archon810
8 years ago

Here's something crazy to figure out. According to https://core.trac.wordpress.org/browser/tags/4.5.3/src/wp-includes/cron.php#L0 and https://codex.wordpress.org/Function_Reference/wp_schedule_single_event, wp_schedule_single_event() actually does this: "Don't schedule a duplicate if there's already an identical event due within 10 minutes of it" by using this code:

<?php
$next = wp_next_scheduled($hook, $args);
if ( $next && abs( $next - $timestamp ) <= 10 * MINUTE_IN_SECONDS ) {
    return false;
}

However, this check doesn't seem to work 100% of the time for some reason. This fact explains why you don't see such reports every hour of every day, and why most of the time, I suppose, you won't see duplicates in cron.

Yet I was watching ours go up every time I refreshed Crontrol until the proposed (more bulletproof) PR #114.

I suspect there's some sort of a race condition on busy sites, or some time comparison issue (maybe related to time zones). Does anyone have any ideas?

#15 @archon810
8 years ago

Here's the effect of resolving the issue with thousands of duplicate cron schedules using a forced wp_next_scheduled() check without time constraints proposed here https://github.com/disqus/disqus-wordpress/issues/113.

See after 22:00:


Web server (16-core 64GB RAM Linode):
http://i.imgur.com/VRB0KaS.png


DB server (12-core 24GB RAM Linode):
http://i.imgur.com/SFqXklg.png

You can see much-increased traffic between db and web servers due to the size of the cron option in wp_options, reading and writing huge amounts of data back and forth; as well as increased load on the web servers trying to unpack 1MB of cron data into memory on every page load.

To make matters worse, for some reason, WP made cron autoload=yes, which means it loads on every page request. I've changed it to "no" in our setups, and I'd really like to have it changed in WP core unless I'm missing a good reason for why it's set to yes.

Last edited 8 years ago by archon810 (previous) (diff)

#16 @peterwilsoncc
8 years ago

  • Severity changed from critical to normal

Changing severity to normal, there's enough filters, etc to allow plugins to handle cron for high traffic sites.

#17 @rmccue
8 years ago

Very similar: #32656.

@archon810 We have a drop-in cron replacement called Cavalcade that you might like to try; it essentially hacks in a solution via the pre_option filters.

#18 follow-up: @archon810
8 years ago

Cavalcade looks great, exactly what WP should have in its core. Hopefully, it can be adopted, or at least its ideas.

The production beta warning has me worried though - we have several very busy sites. Any idea when the beta tag will be lifted?

Props for tackling this issue regardless!

#19 in reply to: ↑ 18 @rmccue
8 years ago

Replying to archon810:

Cavalcade looks great, exactly what WP should have in its core. Hopefully, it can be adopted, or at least its ideas.

Cavalcade isn't really an appropriate solution for core. So long as we have plenty of hooks, core doesn't really need it either (it doesn't fit the 80/20 use case split).

The production beta warning has me worried though - we have several very busy sites. Any idea when the beta tag will be lifted?

We're running it in production on a bunch of large sites (including a lot of our client sites); I just need to edit that page to note it's now production ready. :)

#20 follow-up: @DavidAnderson
8 years ago

@peterwilsoncc You said - "there's enough filters, etc to allow plugins to handle cron for high traffic sites." This isn't really the case. There's a patch awaiting review in https://core.trac.wordpress.org/ticket/32656 to add the needed filters so that it could become the case. Currently, you can only replace the internals of cron with the very nasty hack of a pre_option filter (like Cavalcade does), and then analysing the array in the 'cron' option to see how it changed since it was loaded, in order to work out the change being made.

#21 in reply to: ↑ 20 @peterwilsoncc
8 years ago

Replying to DavidAnderson:

@peterwilsoncc You said - "there's enough filters, etc to allow plugins to handle cron for high traffic sites." This isn't really the case. [...]

This was intended in the context of reducing the severity from critical to normal only rather than the entire ticket. Apologies if you found it ambiguous.

#22 @peterwilsoncc
6 years ago

  • Milestone Future Release deleted
  • Resolution set to wontfix
  • Status changed from new to closed

Closing this as #32656 has added committed in [43540] to allow for large sites to modify cron storage.

#23 @archon810
6 years ago

@rmccue Does Cavalcade benefit from the change committed above in any way? Perhaps in a future release.

#24 @rmccue
6 years ago

@archon810 Answered on the Cavalcade repo, as it's not a core question, but tl;dr yes, we'll switch to using it in the future.

Last edited 6 years ago by rmccue (previous) (diff)

#25 @archon810
5 years ago

I filed https://core.trac.wordpress.org/ticket/49520 because I strongly think WP cron should be resolved properly once and for all.

#26 @peterwilsoncc
5 years ago

#49520 was marked as a duplicate.

This ticket was mentioned in Slack in #core by takahashi_fumiki. View the logs.


4 years ago

Note: See TracTickets for help on using tickets.