Make WordPress Core

Opened 14 years ago

Closed 14 years ago

#14572 closed task (blessed) (fixed)

post_author_meta_box causes fatal error on site with large userbase.

Reported by: tomdebruin's profile tomdebruin Owned by: scribu's profile scribu
Milestone: 3.1 Priority: highest omg bbq
Severity: blocker Version: 3.0.1
Component: Administration Keywords: has-patch
Focuses: Cc:

Description

My site with 16000+ registered users and memory_limit at 96M. When creating a new post or editing an existing post there is a fatal error (Allowed memory size of 100663296 bytes exhausted) when trying to generate the massive list of possible authors in the 'Author' box (post_author_meta_box).

Bizarrely the list is generated fully when viewing as an Administrator - but Author, Editor, etc have problems.

Would it be better to only list the users with contributor and above privileges to that blog rather than everyone on the site?

My current solution is to disable the post_author_meta_box in wp-admin/edit-form-advanced.php... as I can still change post author, should I need to, through Quick Edit.

Attachments (1)

fields.14572.diff (2.2 KB) - added by scribu 14 years ago.

Download all attachments as: .zip

Change History (47)

#1 @scribu
14 years ago

  • Keywords needs-patch added
  • Milestone changed from Awaiting Review to Future Release

Would it be better to only list the users with contributor and above privileges to that blog rather than everyone on the site?

That makes sense. wp_dropdown_users() should be using WP_User_Search anyway, so two birds with one stone.

#2 follow-up: @ryan
14 years ago

The whole code path for this is a mess. We call get_editable_user_ids() to determine whether we should display the post author meta box. We then call post_author_meta_box() where get_editable_user_ids() is called again. We now have two copies of this information in memory. From there we pass the IDs to wp_dropdown_users() which does another users query or uses more memory. Lame.

We need a proper get_users() function that can list users for a given blog (current blog by default) and return just IDs or entire user objects. get_users_of_blog() could call this and get_editable_user_ids() could be deprecated. WP_User_Search could use get_users() too. All places that query on user_level need to be removed. There's no need to treat multisite and single site differently when look for the users on a given blog. $prefix_capabilities should always be used.

#3 @ryan
14 years ago

  • Milestone changed from Future Release to 3.1

#4 in reply to: ↑ 2 @scribu
14 years ago

  • Keywords gsoc added

Replying to ryan:

We need a proper get_users() function that can list users for a given blog (current blog by default) and return just IDs or entire user objects. get_users_of_blog() could call this and get_editable_user_ids() could be deprecated. WP_User_Search could use get_users() too. All places that query on user_level need to be removed. There's no need to treat multisite and single site differently when look for the users on a given blog. $prefix_capabilities should always be used.

Great; I have that baked into my GSoC project (Ajaxify Admin). I'll be opening a ticket with the patch soon.

#5 @scribu
14 years ago

  • Owner set to scribu
  • Status changed from new to assigned

#6 @ryan
14 years ago

Awesome. Looking at wp_list_authors() and lots of other places that could benefit from a flexible get_users() to avoid multiple queries and plain dumb queries.

#7 @scribu
14 years ago

Related: #14579.

#8 @scribu
14 years ago

(In [15539]) use get_users() in get_editable_user_ids() and cache result. See #14572

#9 @scribu
14 years ago

(In [15540]) Deprecate unused get_author_user_ids()and get_editable_authors(). See #14572

#10 @scribu
14 years ago

(In [15541]) Fix post_author_meta_box(). See #14572

#11 @scribu
14 years ago

(In [15542]) Deprecated get_editable_user_ids() altogether, along with similar, unused functions. See #14572

#12 @voyagerfan5761
14 years ago

  • Cc WordPress@… added

#13 @scribu
14 years ago

(In [15565]) introduce _wp_meta_sql(). Preparation for adding blog_id to WP_User_Query. See #14572

#14 @filosofo
14 years ago

From [15565] :

 	4234	 
 	4235	    foreach ( $queries as $query ) { 
 	4236	        $meta_key = trim( @$query['meta_key'] ); 
 	4237	        $meta_value = trim( @$query['meta_value'] ); 
 	4238	        $meta_compare = @$query['meta_compare']; 
 	4239	 
 	4240	        if ( empty( $meta_compare ) || !in_array( $meta_compare, array( '=', '!=', '>', '>=', '<', '<=', 'like' ) ) ) 
 	4241	            $meta_compare = '='; 
 	4242	 
 	4243	        if ( empty( $meta_key ) ) 
 	4244	            continue; 
 	4245

I don't think suppressing errors is the best way of the handling the arguments here. If $query['meta_key'] is an array, then $meta_key will be set to the string "Array", which defeats the point of lines 4243ff. And that situation would be difficult to debug with no error messages.

In general, I would suggest that we suppress errors---if at all---only when dealing with external resources. For example, the call to PHPMailer's send in wp_mail. That makes sense because a lot of the potential problems have to do with things outside of WP's purview, such as server configuration.

Here if we have a type mismatch we want to find out, or we want to check for it in the first place.

#15 @filosofo
14 years ago

Actually, I guess the type mismatch doesn't get suppressed, just when undefined. Still, I stand by my general point: don't use error suppression as a substitute for data validation.

#16 @scribu
14 years ago

Forgot to reference the ticket again: [15566]

#17 @scribu
14 years ago

(In [15567]) fix wp_getAuthors in xmlrpc.php. See #14572 and [15566]

#18 @scribu
14 years ago

(In [15568]) Don't suppress errors in _wp_meta_sql(). See #14572

#19 @nacin
14 years ago

Also in WP_User_Query::prepare_query, 464ff.

#20 @scribu
14 years ago

(In [15570]) Fix meta query in WP_User_Query. See #14572

#21 @scribu
14 years ago

(In [15574]) Use get_users() in wp_dropdown_users(). See #14572

#22 @mdawaffe
14 years ago

This is not a good query for a multisite with a large number of users. For example, make.wordpress.org/ui runs the P2 theme. The theme runs get_users_of_blog() on every blog-side page load during some content filters. That means on every page load, WP_User_Query gets run.

Old get_users_of_blog() query:

EXPLAIN
SELECT user_id, user_id AS ID, user_login,
  display_name, user_email, meta_value
FROM minibb_users, minibb_usermeta
WHERE minibb_users.ID = minibb_usermeta.user_id 
AND meta_key = 'wporg_5_capabilities'
ORDER BY minibb_usermeta.user_id
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE minibb_usermeta ref user_id,meta_key meta_key 256 const 10 Using where; Using filesort
1 SIMPLE minibb_users eq_ref PRIMARY PRIMARY 8 wordpress.minibb_usermeta.user_id 1 Using where

New WP_User_Query + _wp_meta_sql query:

EXPLAIN
SELECT DISTINCT(minibb_users.ID) FROM minibb_users
WHERE 1=1 AND minibb_users.ID IN (
  SELECT user_id FROM minibb_usermeta
  WHERE CASE meta_key
    WHEN 'wporg_5_capabilities' THEN meta_value IS NOT NULL
  END
  GROUP BY user_id HAVING COUNT(*) = 1
) ORDER BY user_login ASC; 
Rows: 2
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY minibb_users index NULL user_login 60 NULL 1283835 Using where; Using index
2 DEPENDENT SUBQUERY minibb_usermeta index NULL user_id 8 NULL 6165217 Using where

So we traded a filesort on 10 rows for a full table scan on both users and usermeta tables. Not worth it :)

#23 follow-up: @scribu
14 years ago

Why is that a dependent subquery? I don't get it.

#24 @automattor
14 years ago

(In [15579]) Rewrite WP_User_Query to use JOIN instead of subquery. See #14572

#25 in reply to: ↑ 23 ; follow-up: @mdawaffe
14 years ago

Replying to scribu:

Why is that a dependent subquery? I don't get it.

Because the MySQL optimizer is naive in this situation: http://dev.mysql.com/doc/refman/5.0/en/subquery-restrictions.html

You can force the inner query to be executed first by doing something like:

EXPLAIN
SELECT DISTINCT(minibb_users.ID) FROM minibb_users
WHERE 1=1 AND minibb_users.ID IN (
  SELECT user_id FROM (
    SELECT user_id FROM minibb_usermeta
    WHERE CASE meta_key
      WHEN 'wporg_5_capabilities' THEN meta_value IS NOT NULL
    END
    GROUP BY user_id HAVING COUNT(*) = 1
  ) AS foo
) ORDER BY user_login ASC;
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY minibb_users index NULL user_login 60 NULL 1254992 Using where; Using index
2 DEPENDENT SUBQUERY <derived3> ALL NULL NULL NULL NULL 10 Using where
3 DERIVED minibb_usermeta index NULL user_id 8 NULL 5830942 Using where

The first subquery is still "dependent", but now it's only operating on a small number of rows the derived table returns.

A JOIN is simpler, though.

The other problem with this query is the usermeta portion (whether it's in a subquery or in a join).

EXPLAIN
SELECT * FROM wp_usermeta
WHERE CASE meta_key
  WHEN 'wp_capabilities' THEN meta_value IS NOT NULL
END
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE wp_usermeta ALL NULL NULL NULL NULL 59 Using where

That can't be indexed. It has to look at each row to see if it matches one of (in this case, the only) case statement.

in _wp_meta_sql(), there's no way for the logic to result in multiple $clauses, so why not just go with:

EXPLAIN
SELECT * FROM wp_usermeta WHERE meta_key = 'wp_capabilities'
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE wp_usermeta ref meta_key meta_key 768 const 2 Using where

Also, I don't understand the point of the GROUP BY user_id HAVING COUNT(*) = 1. When would there ever be multiple wp_capabilities meta rows for a single user_id? Even if there are, why does it matter? That user_id would still be a member of the blog.

#26 @mdawaffe
14 years ago

PS: The last two EXPLAINs are done on a local install, not on the .org site. That's why the full table and desired rows show 59 and 2, respectively, instead of 6 million and 10 from the previous EXPLAINs

#27 follow-up: @scribu
14 years ago

in _wp_meta_sql(), there's no way for the logic to result in multiple $clauses

There is, when querying a role and an other arbitrary meta_key=>meta_value pair.

However, yeah, I should drop the CASE statement if there's only one clause.

#28 @scribu
14 years ago

Also, I don't understand the point of the GROUP BY user_id HAVING COUNT(*) = 1.

It makes it so that only users matching all clauses are returned, but still only needed when having multiple clauses.

#29 in reply to: ↑ 27 @nacin
14 years ago

Replying to scribu:

There is, when querying a role and an other arbitrary meta_key=>meta_value pair.

I don't know if we should have support for such dangerously suicidal queries. I imagine joining on usermeta multiple times will still have better performance.

#30 in reply to: ↑ 25 @filosofo
14 years ago

Replying to mdawaffe:

Because the MySQL optimizer is naive in this situation: http://dev.mysql.com/doc/refman/5.0/en/subquery-restrictions.html

That's grim.

You can force the inner query to be executed first by doing something like:

Couldn't you also get around the optimizer's correlated subquery by joining the results of the subquery instead?

SELECT DISTINCT(minibb_users.ID) FROM minibb_users
JOIN (
  SELECT user_id FROM minibb_usermeta
  WHERE CASE meta_key
    WHEN 'wporg_5_capabilities' THEN meta_value IS NOT NULL
  END 
  GROUP BY user_id HAVING COUNT(*) = 1 
) AS userquery ON minibb_users.ID = userquery.user_id
ORDER BY user_login ASC 

#31 @scribu
14 years ago

I don't know if we should have support for such dangerously suicidal queries.

Let's not exagerate. The API only allows a single meta query, besides the role checking.

Couldn't you also get around the optimizer's correlated subquery by joining the results of the subquery instead?

You could, but again, it wouldn't make use of the meta_key index.

Nacin is right. Multiple JOINs are best:

EXPLAIN 
SELECT wp_users.ID FROM wp_users
INNER JOIN wp_usermeta ON (wp_users.ID = wp_usermeta.user_id)
INNER JOIN wp_usermeta AS mt1 ON (wp_users.ID = mt1.user_id) 
WHERE 1=1 
AND wp_usermeta.meta_key = 'wp_capabilities' 
AND mt1.meta_key = 'admin_color' AND mt1.meta_value = 'fresh' 
ORDER BY user_login ASC
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE wp_usermeta ref user_id,meta_key meta_key 768 const 7 Using where; Using temporary; Using filesort
1 SIMPLE wp_users eq_ref PRIMARY PRIMARY 8 wp-trunk.wp_usermeta.user_id 1
1 SIMPLE mt1 ref user_id,meta_key meta_key 768 const 9 Using where

#32 @scribu
14 years ago

(In [15580]) Use multiple JOINs instead of CASE in _wp_meta_sql(). See #14572

#33 follow-ups: @scribu
14 years ago

(In [15581]) Use _wp_meta_sql() in WP_Query. See #14572. See #14645

#34 in reply to: ↑ 33 @Denis-de-Bernardy
14 years ago

Replying to scribu:

(In [15581]) Use _wp_meta_sql() in WP_Query. See #14572. See #14645

I'm not 100% this is an enhancement when dealing with multiple criteria. SQL optimizers tend to go nuts once the number of joins is high, picking a random plan among several options, and then working out which one is the better one through a genetic algorithm.

This:

SELECT DISTINCT(minibb_users.ID) FROM minibb_users
JOIN (
  SELECT user_id FROM minibb_usermeta
  WHERE CASE meta_key
    WHEN 'wporg_5_capabilities' THEN meta_value IS NOT NULL
  END 
  GROUP BY user_id HAVING COUNT(*) = 1 
) AS userquery ON minibb_users.ID = userquery.user_id
ORDER BY user_login ASC 

... has a group by clause in the subquery, which very much guarantees that the query rewriter will never rewrite the query such as the following:

SELECT DISTINCT(minibb_users.ID) FROM minibb_users
JOIN minibb_usermeta
ON minibb_users.ID = minibb_usermeta.user_id
AND CASE minibb_usermeta.meta_key
    WHEN 'wporg_5_capabilities' THEN minibb_usermeta.meta_value IS NOT NULL
    -- extra WHEN clauses can go here
  END 
GROUP BY user_id HAVING COUNT(*) = 1
ORDER BY user_login ASC 

The above rewrite should allow to do a single join on an indexed column (supposedly user_id is indexed in usermeta? if not, it needs to be), rather than as many joins as there are criteria.

The group by and having clauses also can be dropped entirely, when a unique meta criteria is considered. As can the CASE statement:

SELECT DISTINCT(minibb_users.ID) FROM minibb_users
JOIN minibb_usermeta
ON minibb_users.ID = minibb_usermeta.user_id
AND minibb_usermeta.meta_key = 'wporg_5_capabilities'
AND minibb_usermeta.meta_value IS NOT NULL
ORDER BY user_login ASC 

Other aside, instead of minibb_usermeta.meta_value IS NOT NULL, we could also write TRUE, since the column is never null anyway.

#35 @Denis-de-Bernardy
14 years ago

Oh, and... the DISTINCT definitely seems useless in each of those three statements. The two first have a GROUP BY clause that does the same thing, and the last one's inner join ensures there can only be one.

#36 in reply to: ↑ 33 ; follow-up: @azizur
14 years ago

Replying to scribu:

(In [15581]) Use _wp_meta_sql() in WP_Query. See #14572. See #14645

With define('WP_DEBUG', true); we get PHP Notice:

Notice: Uninitialized string offset: 1 in {path-to}\wp-includes\query.php on line 2221
Notice: Uninitialized string offset: 0 in {path-to}\wp-includes\query.php on line 2221

#37 in reply to: ↑ 36 @azizur
14 years ago

Replying to azizur:

With define('WP_DEBUG', true); we get PHP Notice:

I can confirm that at revision: 15599 this issue is no longer there.

#38 @scribu
14 years ago

Related: #10329

#39 @scribu
14 years ago

  • Type changed from defect (bug) to task (blessed)

#40 @scribu
14 years ago

  • Resolution set to fixed
  • Status changed from assigned to closed

I think we can call this fixed. Please open new tickets if necessary.

#41 @waltervos
14 years ago

Are there ways to make the patch available as a plugin somehow?

#42 @scribu
14 years ago

(In [16605]) Set get_users() as a replacement to get_editable_user_ids() and friends. See #14572

#43 @duck_
14 years ago

  • Priority changed from normal to highest omg bbq
  • Resolution fixed deleted
  • Severity changed from normal to blocker
  • Status changed from closed to reopened

Using wp_dropdown_users with a lot of users causes out of memory fatal errors. 8,300 users on 3.0.3 works fine, but crashes in 3.1 for 'only' 7,000 users. Seems to be running out of memory whilst trying to cache all the retrieved users.

Related: #15786

@scribu
14 years ago

#44 @scribu
14 years ago

  • Keywords has-patch added; needs-patch gsoc removed

fields.14572.diff extends the 'fields' parameter to an array, so that wp_dropdown_users() can get only the field needed.

#45 @scribu
14 years ago

(In [16974]) Fetch only the required field in wp_dropdown_users(). See #14572

#46 @scribu
14 years ago

  • Resolution set to fixed
  • Status changed from reopened to closed
Note: See TracTickets for help on using tickets.