|Reported by:||hailin||Owned by:||hailin|
There are several key issues associated with current cron implementation.
- cron is not atomic.
Every page load will call wp_cron(), check the first timestamp in cron array, if it has expired, it calls spawn_cron which calls wp-cron.php to do fire up the jobs.
This runs into massive concurrency issue on a large system with hundreds of servers, where millions of pages views are generated every day.
The current method to address this issue is in wp-cron.php:
if ( get_option('doing_cron') > $local_time )
update_option('doing_cron', $local_time + 30);
However, the check does not solve the issues resulted from concurrency.
On a busy site, in the particular second when first cron timestamp is expiring, there are 10 blog page loads on 10 different servers.
Since ‘doing_cron” is still being updated by the process#1, or the updated value has not taken effect yet (due to db or cache delays, several milliseconds or longer usually) , process#2 will pass if ( get_option('doing_cron') > $local_time )
Check and also update_option('doing_cron', $local_time + 30). So both processes will proceed to fire up the cron job.
I’ve observed that on a popular blog on a busy production site, ANY cron job was executed 5-7 times! That may be ok for publish_future_post operation, but may not be good for other cron tasks.
An ideal solution is to guarantee every cron is executed once and once only.
I can envision storing all cron jobs in a central table, then a daemon processes it on a PARTICULAR server. Yet this approach may not be as flexible as it may not handle blog-specific jobs well.
A practical solution is to make the cron operation as atomic as possible, knowing that we can never make it truly atomic as there will be database and cross-data center communication delays.
- Server timers are not always correct
Because cron job condition is tested on every blog page load on every server. Any server with a bad clock can ruin the cron jobs, causing future posts being published earlier or never being published.
We can build in some protection mechanism to guard against this.
- Minor issue
Calling time() in multiple places in cron operation chain can be tricky on a busy server, as each call can give different values if the server is overloaded. Passing the first timestamp at cron entry point is logically sound.
- Lack of a central standard time source
Server timer drifting issue caused by power outage, etc poses a fundamental challenge. Software can not prevent hardware failure, and can only do so much to adapt to those failure cases.
Change History (16)
- Owner changed from anonymous to hailin
- Status changed from new to assigned