Make WordPress Core

Opened 14 years ago

Last modified 5 years ago

#12257 assigned defect (bug)

wpdb Scales Badly Due to Unnecessary Copies of All Query Results

Reported by: miqrogroove's profile miqrogroove Owned by:
Milestone: Priority: normal
Severity: critical Version:
Component: Database Keywords: dev-feedback has-patch
Focuses: Cc:

Description

While working on #11726, I encountered a reproducible crash in wpdb::query()

The following code causes memory exhaustion on large result sets:

while ( $row = @mysql_fetch_object($this->result) ) {
	$this->last_result[$num_rows] = $row;
	$num_rows++;
}

The memory exhaustion message is error-controlled, causing a white screen of death even in debug mode.

I searched wp-db.php for references to $this->last_result, and I found no justification for these object reference copies. $this->last_result should be maintained as a MySQL resource and properly optimized using the MySQL client layer instead of this PHP nonsense.

Tagging for dev-feedback to discuss which Milestone is appropriate.

Attachments (9)

wp-db.php.diff (5.5 KB) - added by joelhardi 14 years ago.
patch to wp-db trunk r15628 that uses mysql client funcs to manipulate query result instead of copying into array (see my comment)
wp-db.php.15633.diff (5.5 KB) - added by joelhardi 14 years ago.
Updated patch with more conservative changeset to get_results(). Also, now diffs with r15633.
wp-db.php.get_result().diff (5.6 KB) - added by andreasnrb 13 years ago.
Prevs, diff applied, Removed return null from get results()
12257.mysqli.diff (7.2 KB) - added by sc0ttkclark 9 years ago.
Patch for MySQLi and refresh of code
12257.mysqli-fetch.diff (7.3 KB) - added by sc0ttkclark 9 years ago.
I added some further tweaks to abstract out the fetching and row looping via a new fetch() method.
12257.mysqli-fetch2.diff (7.3 KB) - added by sc0ttkclark 9 years ago.
Fixed issue with mysql_num_rows call which should have been mysqli_num_rows in mysqli mode
12257.mysqli-fetch3.diff (7.3 KB) - added by sc0ttkclark 9 years ago.
Passing $y correctly into fetch() from get_var()
12257.mysqli-fetch-tested.diff (7.2 KB) - added by sc0ttkclark 9 years ago.
Updated with fixes; Now passes wp core tests
12257.mysqli-fetch-tested.2.diff (7.3 KB) - added by sc0ttkclark 9 years ago.
Fix patch paths

Download all attachments as: .zip

Change History (46)

#1 @miqrogroove
14 years ago

There's a ticket about the white screen of death already going at #11151.

#2 @prettyboymp
14 years ago

  • Cc mpretty@… added

I don't know that this should/can be changed. All it is doing is copying the mySQL result into the PHP space. The only work around I could think of would be if any time a multi-row query result was needed, the result would have to follow an iterator interface, which then requires reworking of everything.

Queries shouldn't be returning this large of result sets.

#3 @voyagerfan5761
14 years ago

  • Cc WordPress@… added

#4 @nacin
14 years ago

  • Milestone changed from 3.0 to Future Release

$this->last_result should be maintained as a MySQL resource and properly optimized using the MySQL client layer

Without digging too deeply here, I'd tend to agree. It'd be nice to see more performance squeezed out of wpdb, and prevent one more wsod/memory exhaustion issue.

This is something good to be brought up during scope meetings and then we can mark it early for a release.

@joelhardi
14 years ago

patch to wp-db trunk r15628 that uses mysql client funcs to manipulate query result instead of copying into array (see my comment)

#5 @joelhardi
14 years ago

  • Cc joel@… added
  • Keywords has-patch added

I just added a patch to wpdb (trunk r15628) that does away with the $last_result object property and instead uses a new property $mysql_last_result to store the MySQL resource of the current query. I believe this is what miqrogroove was after. Since I changed the type being stored (from array to MySQL resource), I chose not to repurpose the existing var.

I've never liked wpdb for SELECTs, since IMHO it just recreates the standard PHP mysql client library interfaces (i.e. not much abstraction) with less flexibility and efficiency. So, I'm happy to contribute a patch that helps it do less copying.

And, I contemplated doing an Iterator or at least separate table model class for result sets, but once I got into the code, the fact that the existing wpdb data access methods are all basically thinly abstracted versions of mysql_fetch_row, mysql_fetch_assoc and mysql_fetch_object made it very easy to simply rewrite these methods to use the old mysql client layer.

I haven't done any serious testing or benchmarking, but running through ab using different parameters showed a 3% improvement on front page generation on a very small data set (memory_get_peak_usage of 11M). I would expect a larger improvement on bigger recordsets. Plus, fixing the WSOD on memory-bound queries.

Obviously wpdb is a critical part of core, but IMHO this is a pretty conservative/safe changeset and worth testing.

@joelhardi
14 years ago

Updated patch with more conservative changeset to get_results(). Also, now diffs with r15633.

#6 @nacin
14 years ago

  • Milestone changed from Future Release to 3.1

Looks good.

#7 @hakre
13 years ago

  • Keywords dev-feedback removed

I like the idea, once while doing #11799 I did change the local store of the resultset from an array of objects to a plain array which already helped.

Using a resulset cursor is helping even more memory wise as normally the resultset is mostly requested once from wpdb so no need to store the duplicates.

#8 @joelhardi
13 years ago

Incidentally, while bored last Sunday I did an additional patch that piggybacks on this one. It creates a results class so that the get_results() method of wpdb, instead of copying the result set into an array and returning it, instead returns an object instance that implements the SeekableIterator, ArrayAccess and Countable interfaces and acts as an interface to the MySQL result resource.

So, it removes another layer of data copying. All the other parts of WordPress manipulate it like the array they think they're getting.

The final result of this experiment was that it was 5-10% slower and used about that much more memory. So, not practical for actual use, but good to know that WordPress isn't needlessly retrieving arrays of unused data.

The one thing it might actually be useful for is debugging or profiling -- i.e. I have a debug version that subclasses and echos/logs debug information when various things happen, like when these "arrays" are read from or written to. But, then again, you can use a real debugger for that.

#9 @ryan
13 years ago

I did some quick profiling of a typical front page request with 6 posts. The patch reduced peak memory usage a small amount ( 14,912,496 before vs. 14,877,440 after). Short apache bench runs show the patch to be a bit faster. I haven't noticed any bugs so far.

I'll try testing with some big result sets to see how peak memory usage behaves.

#10 @miqrogroove
13 years ago

Sounds good so far. The two specific scenarios that were triggering this, and patched elsewhere, were user table SELECT statements with 80,000 rows returned, and the upgrade script that was returning a Cartesian product with millions of rows. It would be good to test beyond limits and see proper error mode behavior.

#11 follow-up: @ryan
13 years ago

Doing SELECT * from the user table with 1700 users actually resulted in a higher peak memory usage with the patch. 19,881,000-ish with the patch vs. 19,869,000-ish without.

#12 in reply to: ↑ 11 @joelhardi
13 years ago

Doing SELECT * from the user table with 1700 users actually resulted in a higher peak memory usage with the patch. 19,881,000-ish with the patch vs. 19,869,000-ish without.

I wouldn't expect to see big differences in peak memory usage (your 2 tests are +/- 0.06-0.2%) since WordPress mostly uses full result sets copied into arrays, so any query result is going to be copied into an array at some point anyway. PHP garbage collection will flush the intermediate copies.

The patch just removes one level of (unnecessary) memory-copying so on queries where get_results() is called, the only benefit is slightly shorter execution time.

i.e. when "old" wpdb runs a query:

  1. MySQL result set created (so, 1 copy of result in PHP memory)
  2. result copied into $this->last_result (so, 2 copies in PHP memory ... this is the WSOD crash point mentioned in the ticket)
  3. result freed (1 copy)
  4. get_results() called (2 copies)

and with patch:

  1. same
  2. no extra copy made (still only 1 copy instead of 2)
  3. result not freed (still only 1 copy)
  4. get_results() called (now 2 copies, so peak mem usage is the same as above)

So, the patch only addresses out-of-memory crashes when a large result set is returned but get_results() is never called (i.e. the result set is only retrieved piecemeal, such as with get_row() or get_col()).

Here's a short test script:

require_once('wp-includes/wp-db.php');
$db = new wpdb($dbuser, $dbpassword, $dbname, $dbhost);
$c = 1000;
$q = "select * from wp_users;";

$start = microtime(TRUE);
for ($i = 0; $i < $c; $i++)
  $array = $db->get_results($q);
$end = microtime(TRUE);

echo "$c iterations of $q in ".($end - $start)." seconds\n";
echo memory_get_peak_usage()." peak memory usage\n";

With a small test set (14 users) over many runs, the result is basically what you would expect, both old and patched versions are about the same (patched version only 1.27% faster, but with some variance, 1.21% == 1 standard deviation. This improvement scales with the size of the result set). Peak mem usage 0.52% higher with the patched version, 635384 bytes versus 632064.

However, when you change the call to get_results() in the test script to query() -- so that only one memory copy of the result exists in the patched version, while the old version makes 2 copies -- execution is 160% faster and memory usage drops by 2.3%.

With a slightly larger sample set (129 users), execution is 500% faster and memory usage drops by 26%.

In general, I would say that, since WordPress mostly copies result sets into arrays, there will be the minor (1-5%) speed improvement but nothing huge. However, in cases where WordPress is more selective in returning values from result sets, there will be a larger benefit. As long as WordPress does nothing stupid with SQL (i.e. returning huge result sets it doesn't use, like the happily removed user table SELECT) and only queries for rows and cols it needs for a particular view, benefits from this patch aren't going to exceed the minor speed improvement.

#13 @ryan
13 years ago

Thanks for the extra metrics and the test script. Makes sense and looks good to me.

#14 @nacin
13 years ago

  • Keywords commit added

#15 follow-up: @ryan
13 years ago

Bother. Some plugins are using last_result.

#16 in reply to: ↑ 15 ; follow-up: @prettyboymp
13 years ago

Replying to ryan:

Bother. Some plugins are using last_result.

Can we still create the last_result property for php4 and used the magic get() method to replace it for php5, allowing it to throw a deprecated warning? It isn't the most elegant solution, but it should work.

#17 follow-up: @nacin
13 years ago

last_result is supposed to be private. Tempting to do it anyway.

I recall reading this from "Professional WordPress" -- http://p2p.wrox.com/content/articles/wordpress-database-queries-operations-and-errors, which lists a bunch of private variables. I will make sure no private variables sneak into Brad's next book. :-)

#18 in reply to: ↑ 17 @miqrogroove
13 years ago

Replying to nacin:

last_result is supposed to be private.

It wasn't even a documented member until 3.0. Nobody should be using that variable or support its use.

#19 in reply to: ↑ 16 @joelhardi
13 years ago

Can we still create the last_result property for php4 and used the magic __get() method to replace it for php5, allowing it to throw a deprecated warning? It isn't the most elegant solution, but it should work.

Or, would be easy to just have the magic __get() method return the same thing as before (result set as an array of objects aka get_results()) ... so, it would be fully backward compatible for PHP5. i.e. this would do it:

function __get( $name ) {
	if ( $name == 'last_result' && !is_null( $this->mysql_last_result ) )
		return $this->get_results();
	return NULL; // or could raise error if $name is invalid
}

Personally, I would vote for just removing it, since it's supposed to be private. People who were using it incorrectly just need to make an easy fix to their code by substituting the get_results() call that they should have been using in the first place. I suppose someone could argue that if they're subclassing wpdb and accessing last_result like a protected property then what they're doing isn't totally wrong.

No way I can think of to make it backward compatible for PHP4 without defeating purpose of the patch.

#20 @nacin
13 years ago

I strongly suggest we do this anyway, and offer it as 31compat on wpdevel.

Doubt the __get method is worth it -- maybe?

#21 @ryan
13 years ago

get() seems a pretty cheap way to offer back compat for the majority of users and issue a deprecated warning.

#22 @nacin
13 years ago

  • Resolution set to fixed
  • Status changed from new to closed

[16320]

Memory usage and execution improvements in wpdb. Store and work with resources directly, rather than full copies of results. Plugins which incorrectly used wpdb->last_result (a private property) will need to shift to wpdb->get_results() with no $query. Magic getter is introduced for back compat when using PHP5. props joelhardi, fixes #12257.

#23 @nacin
13 years ago

New ticket regarding error suppression, #15402. Also [16321].

#24 follow-up: @nacin
13 years ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

Tried to create a new site via site-new.php. Got an Already Installed notice. This appears to be taking place in install_blog(), on the query get_results(SELECT ID FROM $wpdb->post), which instead of evaluating to empty, what's being returned is array( 0 => false ). Slight error there.

I can't reproduce outside of that specific instance, but it's related to this change.

Speaking of, that query is crap, and should be SHOW TABLES or get_var or something. Leaving it alone so we can track down the change in functionality.

#25 in reply to: ↑ 24 @joelhardi
13 years ago

I don't use multisite so can't be much help in tracing this.

I did verify that

$wpdb->get_results("select id from wp_posts;");

where wp_posts is an empty posts table returns an empty array (which evaluates to FALSE), which is what it's supposed to. So, if you're instead getting array( 0 => false ) I would guess that this issue relates to code that switches wpdb from one blog to another? Which doesn't make sense because this patch didn't go anywhere near that code in wpdb.

I don't see anything in switch_to_blog() ... unless there's something triggered when the "switch_blog" action fires.

#26 @miqrogroove
13 years ago

nacin, I took a quick peek at it for you. (waiting for a phone call here, heh). Looks like $result and $last_result are being confused in the patch. In query() the MySQL error handling comes after the $result assignment and before the $last_result assignment, meaning the latter should not change unless the query is successful. Without getting deeper into it, I just wonder if you have some queries stepping on each other that way.

@andreasnrb
13 years ago

Prevs, diff applied, Removed return null from get results()

#27 @andreasnrb
13 years ago

Since the wpdevel update http://wpdevel.wordpress.com/2010/11/12/if-you-were-using-wpdb-last_result-yo/ said to use get_results() instead of last_result I thought it would be best that you actually could do that. I just removed if query==null return null. In my simple testing it worked fine.

#28 @nacin
13 years ago

  • Keywords 3.2-early added
  • Milestone changed from 3.1 to Future Release

#29 @hakre
13 years ago

In #16764 I've made some refactorings of WPDB mainly starting at the visibility level. This can be of use to prevent access to private members in the future, like the one reported in this ticket here and I've found another one: #16756.

#30 @wonderboymusic
11 years ago

  • Keywords needs-refresh added; has-patch commit removed
  • Milestone changed from Future Release to 3.6

@k_payne - want to take at look at this nonsense and see if it applies to your drivers?

#31 @brokentone
11 years ago

  • Cc kenton.jacobsen@… added

#32 @ryan
11 years ago

  • Milestone changed from 3.6 to Future Release

#33 @ryan
10 years ago

  • Owner ryan deleted
  • Status changed from reopened to assigned

#34 @sc0ttkclark
9 years ago

Status update on if this will be rebooted for potential inclusion?

@sc0ttkclark
9 years ago

Patch for MySQLi and refresh of code

#35 @sc0ttkclark
9 years ago

  • Keywords dev-feedback needs-testing added; 3.2-early needs-refresh removed

Refreshed patch/changes can be found here:

https://github.com/WordPress/WordPress/pull/127

And the diff I attached to this ticket: https://core.trac.wordpress.org/attachment/ticket/12257/12257.mysqli.diff

#36 @sc0ttkclark
9 years ago

Latest patch also supports MySQLi as per the changes since the last patch to include support for MySQLi.

@sc0ttkclark
9 years ago

I added some further tweaks to abstract out the fetching and row looping via a new fetch() method.

@sc0ttkclark
9 years ago

Fixed issue with mysql_num_rows call which should have been mysqli_num_rows in mysqli mode

@sc0ttkclark
9 years ago

Passing $y correctly into fetch() from get_var()

@sc0ttkclark
9 years ago

Updated with fixes; Now passes wp core tests

#37 @sc0ttkclark
9 years ago

  • Keywords has-patch added; needs-testing removed

Latest patch works and passes core unit tests.

@sc0ttkclark
9 years ago

Fix patch paths

Note: See TracTickets for help on using tickets.