WordPress.org

Make WordPress Core

Opened 3 years ago

Closed 8 weeks ago

#21098 closed defect (bug) (worksforme)

Out of memory errors in XML-RPC

Reported by: koke Owned by:
Milestone: Priority: normal
Severity: normal Version: 3.4
Component: XML-RPC Keywords:
Focuses: Cc:

Description

Since WPiOS 3.0 and WordPress 3.4 we're getting a lot of "Out of Memory" reports on XML-RPC.

See http://ios.forums.wordpress.org/topic/couldnt-sync-posts-after-update-to-300

XML-RPC clients have no idea of the memory available, and even if they did it'd be hard to translate that to "how many items can I request"

Some ideas:

  • Can we try raising the memory limit on xmlrpc.php, or at least for system.multicall?
  • Setting limits on the number of items returned if we can predict that they're going to need too much memory. Currently asking for the latest 100 comments can trigger an out of memory error in some hosts.
  • Fail more gracefully: return an XML-RPC error instead of a HTTP 500 Internal Server Error
  • Any hints on debugging where memory is going and if there are any leaks?

Change History (21)

comment:1 @koke3 years ago

Related: #20665

comment:2 @maxcutler3 years ago

koke, can you post a sample of the XML-RPC request you are generating/sending from the app? I'd be willing to help profile this and figure out why it's eating through so much memory.

comment:3 @koke3 years ago

To reproduce/simulate:

<methodCall>
	<methodName>wp.getComments</methodName>
	<params>
		<param>
			<value>
				<i4>1</i4>
			</value>
		</param>
		<param>
			<value>
				<string>username</string>
			</value>
		</param>
		<param>
			<value>
				<string>password</string>
			</value>
		</param>
		<param>
			<value>
				<struct>
					<member>
						<name>number</name>
						<value>
							<i4>100</i4>
						</value>
					</member>
				</struct>
			</value>
		</param>
	</params>
</methodCall>

Unless you have a lot of (big) comments, you'll have to tweak the memory limits:

  • In wp-includes/default-constants.php set WP_MEMORY_LIMIT to 8M
  • Set a low memory_limit in your php.ini

Another call where we hit tis is a system.multicall of wp.getOptions, wp.getPostFormats, wp.getCategories, and metaWeblog.getRecentPosts.

comment:4 @markoheijnen3 years ago

We really should tweak the memory limit. Curious to see if we can reduce the memory it currently using. Will going to play with that this weekend.

About the multicall. I wouldn't request categories in a multicall. That is something you have no clue about how much data that really is since it just returns all of them.

I don't think if you hit the memory limit you can return something different then 500.

comment:5 follow-up: @maxcutler3 years ago

I was unsuccessful in my first attempt to do some memory profiling on the XML-RPC code. Will try again in a week or two when I have more time.

I'm hesitant to implement anything that artificially raises the memory limit. On shared hosting that may just cause more problems, and those are the environments that are likely seeing this problem currently anyways.

If we want to return a better error, we'd have to use `register_shutdown_function` with `error_get_last` to catch the fatal exception and do something smarter. That seems like a major hack though.

If we can find some memory usage optimizations, then great. But I think the real answer here is that this is the client's problem to deal with. If any XML-RPC methods do not support paging, then that's a feature we should add. Also maybe expose memory_limit in wp.getOptions to let client make a smart decision?

And clients need to be smart about the quantity of data they request. While multicall might result in fewer HTTP requests on a mobile device, it's a big load on a server and it's going to require non-trivial memory on the client to buffer the response and do XML de-serialization. Use threading or other mobile OS provisions to have multiple XML-RPC requests in-flight simultaneously and let the network and OS optimize that.

comment:6 in reply to: ↑ 5 ; follow-up: @koke3 years ago

Replying to maxcutler:

I'm hesitant to implement anything that artificially raises the memory limit. On shared hosting that may just cause more problems, and those are the environments that are likely seeing this problem currently anyways.

WordPress already does this in several places, mostly media and zip handling (just grep memory_limit)

If we can find some memory usage optimizations, then great. But I think the real answer here is that this is the client's problem to deal with. If any XML-RPC methods do not support paging, then that's a feature we should add. Also maybe expose memory_limit in wp.getOptions to let client make a smart decision?

And what should a client do with memory_limit? How do I translate 32M to X comments or categories? I'm willing to reduce the number of items requested, but how do we pick that number? We didn't raise the number of comments requested and we hadn't heard of memory problems before 3.4

And clients need to be smart about the quantity of data they request. While multicall might result in fewer HTTP requests on a mobile device, it's a big load on a server and it's going to require non-trivial memory on the client to buffer the response and do XML de-serialization. Use threading or other mobile OS provisions to have multiple XML-RPC requests in-flight simultaneously and let the network and OS optimize that.

I find interesting that a mobile phone can handle a data size that a server can't. IMO, it's a XML-RPC problem. The way we deal with big XML-RPC requests is by using file streaming: https://github.com/eczarny/xmlrpc/issues/11

This is not a trivial change, but I think WordPress should need the same amount of memory to return 10 or 1000 comments

comment:7 in reply to: ↑ 6 ; follow-up: @redsweater3 years ago

Replying to koke:

While I agree that WP shouldn't use inordinately more memory just because a larger number of assets was requested, to some extent it is inevitable. You should WordPress should use the same amount of memory to return 10 or 1000 comments, but the 1000 comments will on average take 10 times as much memory just for their storage in PHP memory. The problem is worse with posts where the amount of data per post is usually much more on average than for a comment. When you ask for 300 posts, it often adds up to an awfully big single XML response that PHP has to load into memory before it sends down the pipe. (I assume - PHP experts disagree?)

So once any obviously foolish memory usage issues in WP are ruled out, I think paging techniques are the way to tackle this. It looks like paging was added to wp.getRecentPosts, so you can, instead of asking for 100 recent posts, ask for e.g. 10 at a time with an increasing offset. It means 10 requests instead of one, which means more latency and probably an ultimately slower transaction, but at least it's extremely likely not to exceed the memory limitations of the server.

And what should a client do with memory_limit? How do I translate 32M to X comments or categories? I'm willing to reduce the number of items requested, but how do we pick that number? We didn't raise the number of comments requested and we hadn't heard of memory problems before 3.4

Just a little feedback as the developer of a blogging client: there have been memory issues with WordPress installations for years. It's true that it tends to become noticeable around major releases, especially when new features or the default theme in WordPress push PHP memory usage up just enough to be "near the edge" for whatever the user's hosting configuration is. Users typically have to solve the problem either by asking their server administrators to raise the PHP memory allocation for their account, or by manually editing it if empowered to do so in a .htaccess file.

I find interesting that a mobile phone can handle a data size that a server can't. IMO, it's a XML-RPC problem. The way we deal with big XML-RPC requests is by using file streaming: https://github.com/eczarny/xmlrpc/issues/11

Hmm - that's pretty cool. I guess that might be the "PHP experts" answer to my assertion above that PHP has to load everything into memory. I would very much welcome any changes in WordPress that would make it less vulnerable to the out of memory issues that are liable to occur with large requests.

Daniel

comment:8 in reply to: ↑ 7 ; follow-up: @koke3 years ago

Replying to redsweater:

When you ask for 300 posts, it often adds up to an awfully big single XML response that PHP has to load into memory before it sends down the pipe. (I assume - PHP experts disagree?)

It doesn't have to be that way. What I understand WordPress does (simplified):

  • Call wp_get_recent_posts, store the array in memory
  • Create a new array of posts (in memory) with the right output attributes for XML-RPC
  • Build the XML-RPC response as a string with those posts (again in memory)
  • Output the string

I don't know much about PHP's memory management and when do objects get released, but this is using at least 3x the posts size

A more memory efficient solution would be:

  • Output the XML-RPC prefix
  • Call wp_get_recent_posts, store the array in memory (we probably can't get rid of this without redesigning too much)
  • For each post, output the proper xml representation
  • Output the XML-RPC suffix

There are problems with this too, like not being able to set Content-Size in advance, but temporary files could be used to build the xml response

So once any obviously foolish memory usage issues in WP are ruled out, I think paging techniques are the way to tackle this. It looks like paging was added to wp.getRecentPosts, so you can, instead of asking for 100 recent posts, ask for e.g. 10 at a time with an increasing offset. It means 10 requests instead of one, which means more latency and probably an ultimately slower transaction, but at least it's extremely likely not to exceed the memory limitations of the server.

My main concern with paging is to figure out how much we should page. I'm not going to ask for every single post on a single request, but how do I know how many is too many?

Also, this memory problems only happen to a subset of our users, so I don't feel it's right to do more requests (more bandwidth/latency) for the majority of users. I know WordPress.com can handle that amount of data. I'm sure most of the other hosts can too, but I have no idea how to tell them apart.

comment:9 in reply to: ↑ 8 ; follow-up: @maxcutler3 years ago

Replying to koke:

It doesn't have to be that way. What I understand WordPress does (simplified):

  • Call wp_get_recent_posts, store the array in memory
  • Create a new array of posts (in memory) with the right output attributes for XML-RPC
  • Build the XML-RPC response as a string with those posts (again in memory)
  • Output the string

That's correct.

I don't know much about PHP's memory management and when do objects get released, but this is using at least 3x the posts size

A more memory efficient solution would be:

  • Output the XML-RPC prefix
  • Call wp_get_recent_posts, store the array in memory (we probably can't get rid of this without redesigning too much)
  • For each post, output the proper xml representation
  • Output the XML-RPC suffix

There are problems with this too, like not being able to set Content-Size in advance, but temporary files could be used to build the xml response

This would be nice to try, except it would require substantially refactoring the IXR Server implementation. Since the library appears to be abandoned at this point, maybe it's not a big deal to diverge from upstream.

The other downside to a streaming output implementation is that the Content-Length header is required by the XML-RPC spec. I don't know how reliant various client implementations are on that header, but it's a potential source of breakage.

If a server has extremely low memory_limit set, what are the odds that writing to temporary files will be successful? Worth considering and testing though.

Also, this memory problems only happen to a subset of our users, so I don't feel it's right to do more requests (more bandwidth/latency) for the majority of users. I know WordPress.com can handle that amount of data. I'm sure most of the other hosts can too, but I have no idea how to tell them apart.

As redsweater said, as a client author you have to transfer blame to the hosting provider. If you set request parameters that work on 90-95% of hosts, then you've done due diligence. The remaining 5-10% need to take it up with their hosting provider to get it fixed, and you can't waste extraordinary effort and code to cater to the outliers.

comment:10 in reply to: ↑ 9 @koke3 years ago

Replying to maxcutler:

The other downside to a streaming output implementation is that the Content-Length header is required by the XML-RPC spec. I don't know how reliant various client implementations are on that header, but it's a potential source of breakage.

Yeah, that's why we went for the file based streaming on the client, since some servers (apache with mod_security I think) would throw errors if Content-Length was missing

As redsweater said, as a client author you have to transfer blame to the hosting provider. If you set request parameters that work on 90-95% of hosts, then you've done due diligence. The remaining 5-10% need to take it up with their hosting provider to get it fixed, and you can't waste extraordinary effort and code to cater to the outliers.

I agree on that. I think there's no harm on looking into memory optimizations anyway, but as for hosting:

comment:11 @markoheijnen3 years ago

How does that idea work for the current code and custom methods added by plugins?

Also I was busy to do some research in how to reduce the memory print. I came across this link: http://php.net/manual/en/features.gc.performance-considerations.php

I do know bluehost is running 5.2 and now I'm curious if the problem can go away if the site will migrate to 5.3/5.4. I'm not a PHP expert but would love to try that out before doing all kinds of things.

I still find in curious to see how this could be an issue because of 3.4. We didn't really changed the code a lot.

comment:12 @maxcutler3 years ago

Anecdotally, I'm seeing about a 6% peak memory usage increase from 3.3.x to trunk when calling demo.sayHello (which doesn't do any queries). Approximately 29.2 MB to 30.9 MB on one of my test machines.

I'd wager that this small increase might be enough to tip borderline hosting environments over the edge. I would guess that the memory increase is simply due to the amount of new code in 3.4 (both XML-RPC and elsewhere) that gets loaded into memory.

I'll spend some time with heavier methods like wp.getComments and see if there are any easy ways to cut down on incremental memory usage.

comment:13 @maxcutler3 years ago

It turns out that most of the memory usage comes from the include'd code and not the actual data being acted upon (unless you request an extremely large number of items).

Example: On my machine, after wp-load.php is include'd, the memory usage is about 23MB. xmlrpc.php then includes wp-admin/includes/admin.php, class-IXR.php, and class-wp-xmlrpc-server.php which boosts memory usage up to ~30MB. Most typical XML-RPC methods only add 0.5-1.5 MB on top of that.

If you look, wp-admin/includes/admin.php actually loads a bunch of other files. This is needed because the server uses has_meta, wp_generate_attachment_metadata, wp_create_category and similar methods. You can cut memory usage by half (7MB→3.5MB) by only including wp-admin/includes/{post,taxonomy,image}.php. We could save even more if the needed functions were moved elsewhere and we could avoid the other code in those files (e.g., post.php is pretty large).

By comparison, removing all of the legacy methods (everything not in the wp.* namespace) only saves about 0.7MB.

So if we could either move functions around so less code has to be include'd or else change which functions we use in the XML-RPC server, then we could potentially save substantial amounts of memory. And there are probably memory savings to be had in that 23MB baseline from the load process, but that's for another ticket.

comment:14 follow-up: @markoheijnen3 years ago

I'm curious to see what you want to stripe out since the functionality still needs to be there except functionality for viewing things.

comment:15 in reply to: ↑ 14 @maxcutler3 years ago

Replying to markoheijnen:

I'm curious to see what you want to stripe out since the functionality still needs to be there except functionality for viewing things.

To remove wp-admin/includes/post.php:

  • has_meta moved to wp-includes/meta.php. Would prefer to just use get_post_custom/get_metadata instead, but they don't return the ID.
  • get_default_post_to_edit moved elsewhere.

wp-admin/includes/{image,taxonomy}.php can stay. They're both pretty small and the code non-trivial to replace. Including just these two files instead of wp-admin/includes/admin.php would save several MB of memory.

Note: I've only verified this using the unit-test suite. Since some legacy methods still don't have test coverage, there may be methods that they use that I'm not catching. All the more reason to write at least basic tests for every method.

Last edited 3 years ago by maxcutler (previous) (diff)

comment:16 @nacin3 years ago

Note: I've only verified this using the unit-test suite. Since some legacy methods still don't have test coverage, there may be methods that they use that I'm not catching. All the more reason to write at least basic tests for every method.

Because we've always been including admin/includes/admin.php, this isn't really enough either way, as a custom method could be using a function from the admin. Then, explosions.

comment:17 @nacin3 years ago

We're at the point in WordPress where on certain setups, we're very much pushing our default 32MB limit. And, as maxcutler says, this is mostly due to included code, and there isn't an incredible amount we can do here aside from lazy-loading more of it (which can break things) or autoloading code (which is not great in PHP 5.2). So, it is possibly time to consider bumping that number. Because of this, more and more of our code requests a much higher memory limit, which is currently 256MB. This is probably something to consider for XML-RPC requests.

comment:18 @redsweater3 years ago

I didn't realize this was based in voluntary memory limitations of WordPress itself. That definitely paints an interesting picture. These failures are pretty widespread and getting worse, and I just assumed it was all built into stinginess of memory defaults by the ISPs. If the 32MB "optimistic" limit in WordPress is what's being exceeded frequently I would definitely be pleased to see that raised. I just had another customer report of this issue, this time from a DreamHost customer, which is an exceedingly popular host whose base configuration will apply to a huge number of WordPress installations.

comment:19 @daniloercoli3 years ago

  • Cc ercoli@… added

comment:20 @nacin3 years ago

WordPress now defaults to 40 MB.

In xmlrpc.php, we could do @ini_set( 'memory_limit', apply_filters( 'admin_memory_limit', WP_MAX_MEMORY_LIMIT ) );

Note that in the admin, we normally only do that for administrators (unless the user is doing something like editing an image). I am not too concerned about the potential for a denial-of-service via memory exhaustion, but I suppose it is possible. Think a multi-call situation for an under-privileged user. We could wait until login() to issue such a memory_limit call.

comment:21 @chriscct78 weeks ago

  • Keywords mobile dev-feedback removed
  • Milestone Awaiting Review deleted
  • Resolution set to worksforme
  • Severity changed from major to normal
  • Status changed from new to closed

Doesn't look like there's anything left here to fix. After bumping the mem limit, no issues reported on the trac ticket in 3 years. Closing as works for me.

Note: See TracTickets for help on using tickets.