Make WordPress Core

Opened 14 years ago

Closed 14 years ago

Last modified 14 years ago

#11505 closed defect (bug) (fixed)

cron stopped working with standard configuration

Reported by: neoxx's profile neoxx Owned by: westi's profile westi
Milestone: 2.9.1 Priority: high
Severity: major Version: 2.9
Component: Cron API Keywords: has-patch tested
Focuses: Cc:

Description

Hi Guys,

After the update to wp 2.9 (from 2.8.6 where everything worked just fine) my cron-jobs stopped working. Looking inside the cron-array with crontrol http://wordpress.org/extend/plugins/wp-crontrol/ it looks like they are not executed at all. So I end up with all cron jobs waiting for execution (which seems to consume quite a bit of time on every page call). In crontrol this behavior is shown as all cron jobs have the execution status now.

I found the work-around in Otto's reply in this post http://wordpress.org/support/topic/296236 which seems to work but produces dirty rewrites in the browser's address bar...

As my server configuration didn't change at all I assume that something else changed (e.g. could the statement Increase timeout on cron-based requests when checking for upgrades mentioned in http://codex.wordpress.org/Version_2.9 be the reason?)

greetz,
berny

Attachments (3)

ticket-11505-stop-gap.patch (572 bytes) - added by miqrogroove 14 years ago.
The "Rich and Paul" stop-gap patch for anyone in panic mode.
ticket-11505-full.patch (690 bytes) - added by miqrogroove 14 years ago.
Fixes a regression from #11468
11505.diff (1.0 KB) - added by ryan 14 years ago.

Download all attachments as: .zip

Change History (55)

#1 @hakre
14 years ago

The change is not connected to any ticket nor commit so I guess, it's only a guess. I can not say if this is connected, sorry.

#2 @hakre
14 years ago

  • Keywords reporter-feedback added

Can you run a test an send a post request (e.g. with wget) to you site's wp-cron.php file and see if the WP-cronjobs are processed? I can not see any changes in the sourcecode so far that would make a difference (checked /wp-includes/cron.php).

From what I've seen, [11826] is related to the changes you quoted and it's said that it fixed #10349 .

#4 @neoxx
14 years ago

Hi,

@hakre:

Thanks for your quick reply! - I tried to connect with wget and curl to send a post request directly from my server as well as from the machine I'm currently working on and get a 200 (with an empty reply) all the time.

referring to the following code in wp-cron.php this looks to my as desired behavior?

if ( !empty($_POST) || defined('DOING_AJAX') || defined('DOING_CRON') )
	die();

Anyway, the cron-jobs get executed if I call wp-cron.php without any request, thus as a quick-and-dirty workaround I added a cron-job on my server which calls

/usr/local/bin/curl http://www.neotrinity.at/wp-cron.php 

every hour to avoid ?doing_wp_cron urls.

Does this give you already more insights or do you need more tests?

@miqrogroove:

Thanks for the link! - Feels good not to be alone. ;) - Anyway, I've already tried to change the code according to http://wpengineer.com/ping-problem/ before submitting this ticket but it didn't help either.

Berny

#6 @miqrogroove
14 years ago

Unlike wp_get_http(), the update.php and http.php systems never call set_time_limit(). I'd wager this new wp_remote_get timeout value is in conflict with max_execution_time, causing PHP to crap itself every time cron runs in certain environments. With a big fancy cron system and a big fancy HTTP API you'd think that would have been taken care of before wp_version_check got involved, but then that would be an assumption, wouldn't it? ;)

#8 @miqrogroove
14 years ago

New theory:

[10521] uses wp_remote_post() but requires empty($_POST).

#9 @miqrogroove
14 years ago

  • Keywords needs-patch added; cron reporter-feedback removed

#10 @Denis-de-Bernardy
14 years ago

r12424 might be the culprit.

#11 @miqrogroove
14 years ago

bah, looks like $_POST is empty afterall

#12 @miqrogroove
14 years ago

Wow there is some really bad documentation at http://codex.wordpress.org/Function_Reference/add_query_arg

No wonder spawn_crawn() is emitting invalid headers. I'm sure that's not causing the problem, but I find it interesting.

wp_redirect( add_query_arg('doing_wp_cron', '', stripslashes($_SERVER['REQUEST_URI'])) ); 

"oops"

#13 @miqrogroove
14 years ago

http://wordpress.org/support/topic/293727

Is there any chance the spawn aborts before ignore_user_abort() fires on slow servers?

Still just brainstorming. I left a message in the forum asking for volunteers to help us debug this on a broken system.

#14 @miqrogroove
14 years ago

I'm getting ready to debug the site reported by ptmccain. As I was looking over update.php for about the 20th time, I noticed $_POST is NOT empty in wp_update_plugins(). I'll keep an eye on that as we proceed.

#15 @miqrogroove
14 years ago

oh those aren't even cron calls? pfff. Well I'll rule nothing out... :P

#16 @miqrogroove
14 years ago

Without modifying any program code, debug stubs so far show that wp_remote_post() is firing inside of spawn_cron(), but nothing is happening in wp-cron.php so far. The return value from wp_remote_post is as follows:

array (
  'headers' => 
  array (
  ),
  'body' => '',
  'response' => 
  array (
    'code' => false,
    'message' => false,
  ),
  'cookies' => 
  array (
  ),
)

#17 @miqrogroove
14 years ago

get_class() is returning 'WP_Http_Curl'

#18 @miqrogroove
14 years ago

Changing the timeout value from 0.01 to 1 resolved the immediate problems. I'm going to see whether this is related to the safe_mode issues reported in the forum.

#19 @miqrogroove
14 years ago

ptmccain has agreed to let me fumble around with http.php on his server. I'm following up to Denis's link, [12424] at line 1295 would be directly involved in any WP_Http_Curl activity.

@miqrogroove
14 years ago

The "Rich and Paul" stop-gap patch for anyone in panic mode.

#20 @miqrogroove
14 years ago

I confirmed curl is not throwing any errors. Trying to reproduce the problem in a dedicated testing environment now.

#21 @miqrogroove
14 years ago

Testing reveals curl silently craps itself (every time) if either of CURLOPT_CONNECTTIMEOUT_MS or CURLOPT_TIMEOUT_MS are less than 1000.

Reproduced using PHP 5.2.11 with libcurl/7.19.5 OpenSSL/0.9.8l zlib/1.2.3 libidn/0.6.5.

There is no documentation to support this.

Full patch coming right up.

@miqrogroove
14 years ago

Fixes a regression from #11468

#22 @miqrogroove
14 years ago

  • Keywords has-patch tested added; needs-patch removed
  • Severity changed from normal to major

New patch tested ad nauseam, as described above.

#23 @miqrogroove
14 years ago

Line numbers match trunk and branch. Please consider both.

#24 @neoxx
14 years ago

@miqrogroove: You're a genius! - Thanks to your work and your patches it seems like my cron-jobs are executed again.

Regarding my tests the patch on cron.php (changing the timeout from 0.01 to 1) didn't do the trick but together with the second patch it looks better. In other words if I look into my cron-array it seems I miss some cron-jobs, but at least after a few reloads, every cron-job is executed. This is a similiar behavior I've observed from 2.8.0 to 2.8.6.

#25 @miqrogroove
14 years ago

WP-cron fires once per visit or after 60 seconds, whichever is longer. So it can take a solid 10+ minutes for everything to work itself out when it's back-logged.

#26 @neoxx
14 years ago

  • Priority changed from normal to high

ah, I didn't know that. - makes sense now.

anyway, thanks again for the bug-fix! :)

#27 @ryan
14 years ago

If _MS only works if > 1000 then it isn't worth bothering with. Since it also causes #11499, let's remove that block entirely.

@ryan
14 years ago

#28 @Denis-de-Bernardy
14 years ago

ryan's patch seems sensible if we get confirmation that curl is indeed the transport that is causing the trouble.

#29 @westi
14 years ago

  • Cc westi added

http://stackoverflow.com/questions/1856473/why-would-curl-ignore-curlopttimeoutms-but-honor-curlopttimeout

Looks like you need to ensure you have a new enough curl version as well as the _MS const.

#30 @neoxx
14 years ago

@westi: regarding the link you've posted, my curl-version seems current enough.

curl -V

curl 7.19.6 (i386-sun-solaris2) libcurl/7.19.6 OpenSSL/0.9.8k zlib/1.2.3 libidn/1.15 libssh2/0.18

#31 @miqrogroove
14 years ago

... and there's no mention of this in the curl 7.19.7 change log.

#32 @dd32
14 years ago

From westi's comment http://core.trac.wordpress.org/ticket/11499#comment:8 it looks like its probably the PHP extension wrapper which has the bug, not Curl itself.

So it'd depend on the version of the PHP extension you're using, or if compiled in, what version of PHP.

#33 @miqrogroove
14 years ago

Wrong again.

curl's latest main.c at line 3275 ...

  timeout.tv_sec = ms/1000;
  ms = ms%1000;
  timeout.tv_usec = ms * 1000;

... assigns the wrong value to its microsecond component. This indicates the bug is specific to non-Windows platforms.

#34 follow-up: @dd32
14 years ago

Ah.. 2 seperate curl issues, Too late for my brain.

#35 @miqrogroove
14 years ago

Oh wait.. 10 % 1000 = 10, I forgot :P

#36 in reply to: ↑ 34 @westi
14 years ago

Replying to dd32:

Ah.. 2 seperate curl issues, Too late for my brain.

Yes there are two separate curl issues at play here for sure.

#37 @miqrogroove
14 years ago

gah, FINALLY!!!

http://curl.haxx.se/libcurl/c/curl_easy_setopt.html

CURLOPT_TIMEOUT_MS

Like CURLOPT_TIMEOUT but takes number of milliseconds instead. If libcurl is built to use the standard system name resolver, that portion of the transfer will still use full-second resolution for timeouts with a minimum timeout allowed of one second. (Added in 7.16.2)

#38 @neoxx
14 years ago

phpinfo();

PHP Version 5.2.11
Build Date Oct 13 2009 13:53:25

cURL support enabled
cURL Information libcurl/7.19.6 OpenSSL/0.9.8k zlib/1.2.3 libidn/1.15 libssh2/0.18 

#39 follow-up: @ryan
14 years ago

Are we okay with removing that block completely? I think that is best and easiest, at least for 2.9.1. #11468 shouldn't have went in when I did. I got over-enthusiastic at the possibility of fixing some long standing cron issues and instead caused more cron issues.

#40 in reply to: ↑ 39 @westi
14 years ago

Replying to ryan:

Are we okay with removing that block completely? I think that is best and easiest, at least for 2.9.1. #11468 shouldn't have went in when I did. I got over-enthusiastic at the possibility of fixing some long standing cron issues and instead caused more cron issues.

For 2.9.1 we should revert the changes made to curl by #11468 to use CURLOPT_TIMEOUT_MS for sure which fixes this issue.

So lets commit http://core.trac.wordpress.org/attachment/ticket/11505/11505.diff and close this ticket down.

#41 @neoxx
14 years ago

works for me, though without miqrogroove's first patch (ticket-11505-stop-gap.patch) cron-jobs are executed twice some time...

#42 follow-up: @miqrogroove
14 years ago

So westi and ryan, you want to roll back part of the API feature, but for only one transport. That means for systems supporting fractional timeouts above 1 sec, that support is being removed, but only for curl. That is behavioral change, whereas I just want to patch the bug and have as little impact on 2.9.1 as possible. Is this a trivial difference, or could there be some advantage to one way versus the other?

#43 in reply to: ↑ 42 @westi
14 years ago

Replying to miqrogroove:

So westi and ryan, you want to roll back part of the API feature, but for only one transport. That means for systems supporting fractional timeouts above 1 sec, that support is being removed, but only for curl. That is behavioral change, whereas I just want to patch the bug and have as little impact on 2.9.1 as possible. Is this a trivial difference, or could there be some advantage to one way versus the other?

If there is a guaranteed way for curl to always support fractional timeouts including not triggering the Warning seen in #11499 then I would happily take that as an alternative patch for this issue.

Maybe the best solution is to apply Ryans patch for 2.9.1 and work on supporting fractional timeouts for curl in 3.0 - don't forget they were only introduced at the last minute into 2.9 and in hindsight that change should have probably waited until 3.0

#44 @miqrogroove
14 years ago

neoxx, the first patch shouldn't affect anything after the second patch is installed.

#45 follow-up: @miqrogroove
14 years ago

westi, the advantage being if that part of #11468 is rolled back then it's not necessary to hack $timeout & !1024 into the script for #11499?

#46 in reply to: ↑ 45 @westi
14 years ago

Replying to miqrogroove:

westi, the advantage being if that part of #11468 is rolled back then it's not necessary to hack $timeout & !1024 into the script for #11499?

Yes. And that we go back to tested code which we know works for a large number of end-users by virtue of the fact that it was what we had in a previous release.

#47 @miqrogroove
14 years ago

I'm crazy enough to suggest this...

$timeout_ms = max(((int) ceil( 1000 * $r['timeout'] )) & ~1024, 1000);

... but it offers no real advantage because timeout values between 1.02 and 2.05 are ignored. So, yes, I agree PHP is too far broken to fix both tickets permanently with any one-liner.

+1 ryan's version for both tickets.

#48 follow-up: @neoxx
14 years ago

@miqrogroove: hmm, I didn't have a closer look at the source-code, just c&p'ed it. - for clarification: I meant using 11505.diff together with ticket-11505-stop-gap.patch.

In that case I maybe should open a second ticket for double-executed cron-jobs since 2.8.n (as far as I remember)?

#49 in reply to: ↑ 48 ; follow-up: @miqrogroove
14 years ago

Replying to neoxx:

open a second ticket for double-executed cron-jobs

If it's causing a problem in 2.9, yes. Otherwise, wait until it causes a problem :P

All 3 patches do the same thing, just with an eye toward the increasingly big picture.

#50 in reply to: ↑ 49 @Denis-de-Bernardy
14 years ago

Replying to miqrogroove:

Replying to neoxx:

open a second ticket for double-executed cron-jobs

If it's causing a problem in 2.9, yes. Otherwise, wait until it causes a problem :P

I can only see that one occurring if there are concurrent page loads, personally.

One trick could be to set a transient that expires after a minute before the cron starts; and then to check if the transient is set before running the cron at all. This would introduce some level of throttling. Even then, though, we'd potentially get dup crons on a site with really heavy traffic if two concurrent page loads try to add the transient.

#51 @automattor
14 years ago

  • Resolution set to fixed
  • Status changed from new to closed

(In [12472]) Don't use CURLOPT_CONNECTTIMEOUT_MS or CURLOPT_TIMEOUT_MS due to bugginess. fixes #11505 see #11499

#52 @ryan
14 years ago

(In [12473]) Don't use CURLOPT_CONNECTTIMEOUT_MS or CURLOPT_TIMEOUT_MS due to bugginess. fixes #11505 see #11499 for 2.9

Note: See TracTickets for help on using tickets.