Make WordPress Core

Opened 11 years ago

Last modified 6 months ago

#26050 new defect (bug)

Continual Admin Page POST (HeartBeats?) Can Cause SQL Connection Issues

Reported by: optimized-marketingcom's profile optimized-marketing.com Owned by:
Milestone: Priority: normal
Severity: normal Version: 3.6
Component: Administration Keywords: needs-patch bulk-reopened
Focuses: performance Cc:

Description

The admin pages (/wp-admin/widget.php, /wp-admin/index.php, etc..) will continually POST to admin-ajax.php ever 2.5-3 min. If multiple admin pages are open for long periods of time(ex. over night) the SQL Connections associated with the POSTs will begin to progressively have longer sleep times. I have seen MYSQL Connecitons Sleep upwards of 45 seconds from these POSTs.

This ultimately cause my Server/Account to be shutdown by my host due to the long sleep time taking up connections for no reason.

I wouldn't expect the Connections to progressively sleep longer. I was surprised to find that the pages were Sending POSTs even when they were not being used(before I knew about the heartbeat api).

Is this the expected experience/results? Should there be a sleep setting on these POSTs to stop after a given time frame? Or can all of these be rolled into 1 POST?

To Reproduce:
Open up the following in their own tabs: /wp-admin/widgets.php, /wp-admin/index.php, /wp-admin/theme.php, /wp-admin/theme-edit.php, /wp-admin/plugins.php, /wp-admin/post.php

Verify they are making post requests ever 2.5-3min, I use Tamper Data add-on for firefox, but fiddler or any other http collector will do.

In MYSQL watch the connections by using the command: show full processlist;

run the SQL command as soon as you see the POST occur in tamper data or fiddler. Notice the Sleep and time is 0 for the POST request. you might not see it in the process list at first since it is so quick.

Once POSTs are occuring ever few min and MySQL is setup to view processes/connections leave the pages open over night.

Then test again in the morning by viewing the processlist once you see a POST submitted. It usually easiest to see when post are submitted around the same time. You will notice the sleep time has increased and you will also notice that the response times for the POSTs have increased.

I reproduced this on twenty-thirteen theme with all plugins deactivated and using firefox.

Attachments (1)

ProcessList Dev01 Sleep Time.png (23.2 KB) - added by optimized-marketing.com 11 years ago.
ProcessList for POSTs to admin-ajax.php from Admin Pages

Download all attachments as: .zip

Change History (20)

#1 follow-up: @azaozz
11 years ago

Not sure I completely understand the problem.

If multiple admin pages are open for long periods of time(ex. over night) the SQL Connections associated with the POSTs will begin to progressively have longer sleep times.

What exactly causes the SQL Connections to have "sleep times"? Would they have "sleep time" when you have visitors to your site? I know some hosting companies have limits for "SQL connections per hour" or similar, probably this is the problem.

Heartbeat behaves very similarly to autosave. The differences are that it runs on all admin pages vs. only Add/Edit Post, and connects less frequently when the user is not active. Have you had problems with your host when leaving the Edit Post page loaded overnight?

Is this the expected experience/results? Should there be a sleep setting on these POSTs to stop after a given time frame?

Unfortunately it is impossible to guess what the user's intentions are. When a web page is left loaded in the browser, the most common reason is that the user would come back to it and continue to use it.

Most computers would "fall sleep" after a period of inactivity usually 10-30min, effectively disconnecting from Internet. Been testing adding the same functionality to heartbeat, however that would affect the other functios. For example post locks will be released too.

Related: #25073.

Last edited 11 years ago by azaozz (previous) (diff)

@optimized-marketing.com
11 years ago

ProcessList for POSTs to admin-ajax.php from Admin Pages

#2 in reply to: ↑ 1 @optimized-marketing.com
11 years ago

If multiple admin pages are open for long periods of time(ex. over night) the SQL Connections associated with the POSTs will begin to progressively have longer sleep times.

What exactly causes the SQL Connections to have "sleep times"? Would they have "sleep time" when you have visitors to your site? I know some hosting companies have limits for "SQL connections per hour" or similar, probably this is the problem.

I don't know what is causing the sleep times, this is just what I am seeing during POSTs to admin-ajax.php. I have attached an image with an example of what I am seeing.

When visitors load up a page there is a sleep command but the time is 0 sec. I can see how the connection limits per hour could be an issue but for my hosting the only SQL limits I know of are 25 concurrent connections and connections over 30 sec. I will double check with my Hosting Service. By the way this is the 2nd hosting service I have been able to reproduce this on(hostgator and godaddy).

Heartbeat behaves very similarly to autosave. The differences are that it runs on all admin pages vs. only Add/Edit Post, and connects less frequently when the user is not active. Have you had problems with your host when leaving the Edit Post page loaded overnight?

I have not had any issues leaving edit post pages open all day and night. I actually do this quite frequently, that is why I was surprised when this occurred for the admin pages. I have also not been able to reproduced this issue with only 1 or 2 admin pages open over night. It could be that the more connections happening repeatedly could cause this issue to manifest its self earlier. so 1 or 2 open admin pages with the heartbeat may take 2 days to reproduce were 6 open only takes a few hours.

Is this the expected experience/results? Should there be a sleep setting on these POSTs to stop after a given time frame?

Unfortunately it is impossible to guess what the user's intentions are. When a web page is left loaded in the browser, the most common reason is that the user would come back to it and continue to use it.

I am referring to the expectations of how WP would function and act in this type of situation. Should it be causing issue with SQL connetions just hanging around for 30+ seconds without any human activity?

Most computers would "fall sleep" after a period of inactivity usually 10-30min, effectively disconnecting from Internet. Been testing adding the same functionality to heartbeat, however that would affect the other functions. For example post locks will be released too.

I understand the reason for having the heartbeat and a sleep function might be a good solution. Just trying to figure out the best happy medium. Is the need for the heartbeat after 10+ hours with the pages being inactive worth the negative consequences? Is there a better way to trigger the heartbeat? or have one common heartbeat for multiple pages/whole site? Just trying to figure out if there is an in between option.

Is there any way to turn off the heartbeat and what would be the consequences? I tried adding remove_action( 'admin_init', 'wp_auth_check_load' ); in my functions.php file, but that didn't do anything.

Hopefully this clears some things up. Let me know if you need more information or have more questions.

#3 follow-up: @azaozz
11 years ago

As far as I can find out, the MySQL "sleeping" connections are waiting to terminate. Not sure what could be causing them to wait, WordPress doesn't use persistent DB connections and all default heartbeat AJAX requests don't load the server much (quite less than a normal page view).

...have one common heartbeat for multiple pages/whole site?

There is no way for JS to access a different browser window/tab. When different pages of the same site (same domain) are opened in different windows, it may be possible to use cookies and/or session storage to communicate between them. However not sure if that's a good solution, can experiment with it. In any case will try to add "sleep" functionality for 3.8.

Good way to turn heartbeat off is to not load the JS. Not advisable at least on the Add/Edit Post screen as the plan is to use heartbeat as transport for autosave. Something like this should work (untested):

add_action( 'init', 'my_deregister_heartbeat', 1 );
function my_deregister_heartbeat() {
	global $pagenow;
	
	if ( 'post.php' != $pagenow && 'post-new.php' != $pagenow )
		wp_deregister_script('heartbeat');
}

#4 @nacin
11 years ago

  • Component changed from General to Autosave

#5 in reply to: ↑ 3 ; follow-up: @lisota
11 years ago

Replying to azaozz:

We run a fairly large news site on WordPress with heavy traffic and a staff of 5-10 editors who spend their entire workday in wp-admin, specifically in edit posts screens.

The heartbeat API, specifically the wp-refresh-auto-lock, is a major CPU hog. I can easily max out our 8-core server by opening 15-20 edit post windows and having them all hitting the server with refresh post lock requests every 15 seconds. Logging out of those screens quickly reduces CPU to under 10%.

In our case, it appears to have less to do with MySQL connections and more to do with pure CPU utilization.

I've deregistered the heartbeat script, as above, with a little bit of success, but I need to disable the 15 sec requests on the edit post screen. Any thoughts on how to accomplish this? We are willing to sacrifice post locking to get our wp-admin performance back.

I tried deregistering heartbeat everywhere, including post.php and post-new.php. However, that seems to affect another AJAX function, the ajax tag search no longer works.

#6 in reply to: ↑ 5 ; follow-up: @azaozz
11 years ago

Replying to lisota:

The heartbeat API, specifically the wp-refresh-auto-lock, is a major CPU hog. I can easily max out our 8-core server by opening 15-20 edit post windows and having them all hitting the server with refresh post lock requests every 15 seconds. Logging out of those screens quickly reduces CPU to under 10%.

Before Heartbeat, (WordPress 3.5) the Edit Post page was connecting every minute for autosave. In 3.8 Heartbeat connects every 15 sec. for the "focused" page and every 2 min. for the "non-focused" pages. So if you have 5 people that have 4 Edit Post pages opened at the same time in 3.5 you'll always see 20 connections per minute.

In 3.8 there will be 20 connections for the focused pages (5 pages x 4 connections) plus 8 (average) for the non-focused pages (15 pages x 0.5 connections). In addition there may be up to 5 connections per minute if the posts in the focused pages are being edited as autosave would connect when there are changes.

On average there will be about 50% - 60% (20 vs. 30-33) more connections in 3.8 compared to 3.5. Also in 3.9 autosave uses heartbeat for transport reducing the number of connections to 28. This number is higher but is not even two-fold.

I tried deregistering heartbeat everywhere, including post.php and post-new.php. However, that seems to affect another AJAX function, the ajax tag search no longer works.

Deregistering heartbeat.js on the Edit Post screens is not a good idea. It would also stop post.js from loading (as heartbeat.js is a dependency for post.js).

You can increase the interval to 60 sec. in couple of ways. From JS:

jQuery(window).load( function() {
  if ( window.wp && wp.heartbeat ) {
    wp.heartbeat.interval( 60 );
  } 
});

This has to run after jQuery(document).ready(), hence the jQuery(window).load().

From PHP:

add_filter( 'heartbeat_send', 'my_heartbeat_settings' );
function my_heartbeat_settings( $response ) {
	if ( $_POST['interval'] != 60 ) {
		$response['heartbeat_interval'] = 60;
	}
	return $response;
}

This will reset the interval to 60 sec. on every "beat" regardless of the current page and the initial settings.

Of course changing the interval will impact the Post locking functionality. It can also be set to 30 sec.

With the above example and the interval set to 60 sec. there will be 12-13 connections per minute in 3.9 plus up to 5 for autosave in 3.8, i.e. less than before Heartbeat was introduced.

Last edited 11 years ago by azaozz (previous) (diff)

#7 in reply to: ↑ 6 @lisota
11 years ago

Replying to azaozz:

This is very helpful. I was struggling to increase the heartbeat interval. There is a fair amount of incorrect guidance on the web about this setting.

We are willing to sacrifice post locking by eliminating the wp-refresh-post-lock altogether. The CPU performance hit in our high traffic, multi-editor environment is huge and this has been affecting site performance for awhile now. It took us awhile to narrow down the exact cause, but it is definitely the large number of post lock refreshes via admin-ajax.

I tried to deregister heartbeat.js on all pages, but as you noted, found other problems with post.js dependencies. (Namely that the tag box autocomplete doesn't function.)

Can you provide any guidance on eliminating the wp-refresh-post-lock? I am thinking something like disabling the heartbeat event handlers like this:

jQuery(window).load( function() {
  jQuery(document).off('.refresh-lock');
  } 
});

I have managed to allow suspension of the heartbeat when post edit windows are left open for a long time via the following. Looks like this was added in 3.8.

remove_filter( 'heartbeat_settings', 'wp_heartbeat_set_suspension' );

I hope that as the Heartbeat API makes it into more features (i.e. autosave), that there can be some testing on high-traffic sites that intensively use the post editor. I'm sure none of this is even noticeable on smaller, infrequently-edited blogs. But on high traffic sites, it's been a CPU crusher.

#8 follow-up: @azaozz
11 years ago

...it is definitely the large number of post lock refreshes via admin-ajax.

One reason may be that refreshing the post lock writes to the DB. Unfortunately I don't see a way to make this less frequently. The locks "last" for 150 sec. If the browser window is "blurred", the next refresh will be in 120-135 sec. plus connection time.

The only way to decrease the frequency of writing to the DB would be to (substantially) increase the time locks last. However that brings other problems: the lock should be released when the user navigates away from the Edit Post page. The only way this could be done is by another AJAX request, but since that request fires on window.unload, it is sometimes unsuccessful.

jQuery(document).off('.refresh-lock'); would work well. Another way would be from PHP by removing the filters add_filter( 'heartbeat_received', 'wp_check_locked_posts', 10, 3 ); and add_filter( 'heartbeat_received', 'wp_refresh_post_lock', 10, 3 );.

Everything in Heartbeat is based on filters, both in JS and PHP, so controlling it should be easy. The same is true for remove_filter( 'heartbeat_settings', 'wp_heartbeat_set_suspension' );.

I hope that as the Heartbeat API makes it into more features (i.e. autosave), that there can be some testing on high-traffic sites that intensively use the post editor.

This has been working on WordPress.com and other huge sites without noticeable increase of server load. Seems the most impact is on medium and large sites.

Disabling the updates of post locks will certainly reduce server load as writing to the DB always takes more resources. Been thinking about ways to make that more efficient, perhaps storing the locks in persistent memory cache (when available) and "flushing" them to the DB every 30 min.

#9 follow-up: @nacin
11 years ago

This has received a decent amount of testing in both simulated situations and real environments. I would like to see some more data on what the reported problem is. "CPU hog" isn't really enough to go on. Is it MySQL connections? Queries? PHP processes? Memcache writes? What's causing this CPU load? Is it just load averages spiking, or is anything else unusual? This isn't an expensive process, and a request a second or whatever the average might be with a bunch of concurrent editors shouldn't by itself be a big issue. Front-end traffic also shouldn't be a major factor.

#10 in reply to: ↑ 8 ; follow-up: @heydonovan
11 years ago

Replying to azaozz:

That's exactly what we were thinking. Instead of writing to the database, check for persistent caching, and write the
_edit_lock modifications straight to memcached.

Replying to nacin:

We do know that when a ton of authors are editing a bunch of different posts, the load average on the server almost quadruples. I've enabled POST logging, and about 95% of the requests within that time period are from "wp-refresh-post-lock". Looking at some of our logs, I can see it was requested more than 5000+ times, so that would account for a lot of posts being edited at once. I'll see if I can dig up some more data.

#11 in reply to: ↑ 9 @lisota
11 years ago

Replying to nacin:

The reply from heydonovan is from our ISP (WPEngine). We'll will recreate the load in a semi-controlled manner by starting with no post editing and then suddenly editing lots of post simultaneously so that we can demonstrate the effect that the post locking heartbeat is having on our CPU. Will post results here when we've completed that.

#12 @azaozz
11 years ago

We will recreate the load in a semi-controlled manner by starting with no post editing and then suddenly editing lots of post simultaneously...

It would be good if you could repeat this test with and without post locking enabled so there is a straight comparison.

It is still strange that increasing these AJAX requests by about 60% has such effect. Is there anything else running only when DOING_AJAX is defined?

#13 in reply to: ↑ 10 @nacin
11 years ago

Replying to lisota:

We'll will recreate the load in a semi-controlled manner by starting with no post editing and then suddenly editing lots of post simultaneously so that we can demonstrate the effect that the post locking heartbeat is having on our CPU.

Please don't waste too much time simulating this. I believe you that you're having problems, but a CPU graph alone won't help us. We need to know what specifically is causing the problem.

Replying to heydonovan:

Replying to azaozz:

That's exactly what we were thinking. Instead of writing to the database, check for persistent caching, and write the _edit_lock modifications straight to memcached.

Replying to nacin:

We do know that when a ton of authors are editing a bunch of different posts, the load average on the server almost quadruples. I've enabled POST logging, and about 95% of the requests within that time period are from "wp-refresh-post-lock". Looking at some of our logs, I can see it was requested more than 5000+ times, so that would account for a lot of posts being edited at once. I'll see if I can dig up some more data.

Each call has up to three SELECT queries and one UPDATE query for the request, as a result of get_post() and update_postmeta(). The SELECT queries (and the WHERE clause of the UPDATE) all hit indexes that are both well-established and small. The UPDATE query does not write to an indexed field and thus won't force an update of the index. These counts do not include the loading of WordPress, which depending on the environment, setup, plugins, etc., could range anywhere from dirt cheap to absurdly expensive.

At the moment, I am more inclined to think this problem is peculiar to the site or environment, and not to the load. I have no qualms with being proven wrong, but I haven't heard a complaint from other hosting companies and this is the first inkling I've heard from WPEngine. 5000 times a day is still only one request every 17 seconds and just 5000 UPDATE queries, which even a crappy shared hosting box would absorb without blinking. I ran these queries on a site with a few tens of thousands of postmeta records and the indexes laughed at me. How fast are these queries running for you? How many rows are in the wp_postmeta table? I don't have access to my massive test database until I'm home later this week, but I don't expect it to be a problem; these are some of the most basic queries WordPress runs. Does the table CHECK out okay? Is the table optimized?

The issue I see here with going to memcached (which, from an API perspective, is totally doable) is that would imply the DB write queries are the problem. The limited evidence thus far only points to generic load issues. We need to know what's actually causing that load. Is it the FPM (?) process that's taking a long time, or is MySQL the bottleneck? Is it network lag in hitting the DB server? (Assuming the DB is on a separate server.) Is it somehow fetching and invalidating data from memcached that's actually somehow slow?

If you remove the post locking action, are other heartbeat requests affected? Are other admin-ajax requests affected? Are other POST requests affected? I've heard stories of people having post saves take forever, and then a developer goes in and realizes that about fifteen seconds worth of queries and calculations are being performed on every save by a plugin, entirely accidentally. Is it even *these* requests that are the slow ones, or is the issue cache invalidation and then some kind of race condition for frontend requests (which is one way traffic could play a role in this)?

A periodic POST request that results in a few select queries and a single, simple write query should not in and of itself cause problems. Many different avenues to pursue here.

#14 @JacobN
11 years ago

I stumbled onto this ticket after an article I wrote about limiting the WordPress autosave feature via wp-config.php with define('AUTOSAVE_INTERVAL', 86400); to cut down on excessive admin-ajax.php executions, seemed to no longer be as effective.

That's when I discovered the Heartbeat API introduced in WordPress 3.6 and I've been spending some time reading and testing it since then. If I'm misunderstanding something, please correct me.

Coming from a web host / system admin's perspective, the nature of the WordPress Heartbeat API seems very aggressive by default.

Most hosts when looking at a customer's CPU usage will simply go through your access logs and tell you, well this script is getting a lot of hits it must be the culprit. A simple Google for suspended admin-ajax.php or something of that nature shows this is pretty common.

If I simply login to my admin dashboard, then minimize the window I'm still generating roughly 25 PHP script executions per half hour. On a fresh WordPress 3.8.1 install on a suPHP server, each of those admin-ajax.php PHP script executions used on average about 0.2 seconds of CPU time doing absolutely nothing.

That might not seem like a lot of activity, but imagine the scenario of a single shared hosting account that has let's say 10 separate WordPress installations running from it. If each of those WordPress installs was setup for a different client, you could easily have 10 unique admin dashboards logged into at once.

So in a half hour, we could see:

25 PHP executions @ (0.2 CPU seconds) = 5 CPU seconds
5 CPU seconds * 10 admin dashboards = 50 CPU seconds

Still not terrible by any means with just under 1 CPU minute per half-hour of logged in admin activity doing nothing.

But if all those stayed logged in all day (we see this more than I'd like) that'd be close to 40 CPU minutes of PHP executions, effectively doing nothing other than POST'ing to admin-ajax.php

This of course would be a rare case, but the same applies to the handling of wp-cron.php which is wasteful by nature at doing nothing, unless you define('DISABLE_WP_CRON', 'true'); in your wp-config.php and then use a real cronjob to schedule the wp-cron.php script to run on a reduced set interval, instead of the default of calling the script every single time a bot hits the site, just to ask, OK wp-cron.php do I have anything to do yet?

If possible, I'd like to also see an easy to turn off ability for the Heartbeat API. Not very many people would have to use it, but for customers that are already right at the brink of going over shared CPU limits and needing a VPS or dedicated platform to handle more, using up CPU time to check if anything needs to be done on a set interval of a minute or two isn't ideal. I would think this would be especially true if you're hosting in the cloud and paying by the CPU cycles.

It's not uncommon to see shared servers from multiple hosts that would have hundreds of WordPress installations spread across the box. In instances like these, I'd worry about all the additional PHP overhead on the server's total health, as well as all of the Apache connections being used up to handle all the POST requests, or as has been talked about here the SQL overhead from all that.

I don't think the WordPress Heartbeat API is a CPU hog per se, but it definitely has some qualities that could turn it into one given the right environment, just like wp-cron.php. So it would be handy to not have to modify your functions.php script and use a function to turn it off on all or certain pages.

#15 @iseulde
10 years ago

  • Component changed from Autosave to Administration
  • Focuses performance added
  • Version changed from 3.7.1 to 3.6

This ticket was mentioned in Slack in #core by michael.ecklund. View the logs.


9 years ago

#17 @lukecavanagh
8 years ago

  • Keywords needs-patch added

This seems like something that should be resolved for WP 4.8.

#20 @SergeyBiryukov
5 years ago

  • Keywords bulk-reopened added

Related: #47292

#21 @pbearne
6 months ago

is this still an issue we need to fixed/adjust?

Note: See TracTickets for help on using tickets.