Opened 14 years ago
Closed 14 years ago
#14572 closed task (blessed) (fixed)
post_author_meta_box causes fatal error on site with large userbase.
Reported by: | tomdebruin | Owned by: | scribu |
---|---|---|---|
Milestone: | 3.1 | Priority: | highest omg bbq |
Severity: | blocker | Version: | 3.0.1 |
Component: | Administration | Keywords: | has-patch |
Focuses: | Cc: |
Description
My site with 16000+ registered users and memory_limit at 96M. When creating a new post or editing an existing post there is a fatal error (Allowed memory size of 100663296 bytes exhausted) when trying to generate the massive list of possible authors in the 'Author' box (post_author_meta_box).
Bizarrely the list is generated fully when viewing as an Administrator - but Author, Editor, etc have problems.
Would it be better to only list the users with contributor and above privileges to that blog rather than everyone on the site?
My current solution is to disable the post_author_meta_box in wp-admin/edit-form-advanced.php... as I can still change post author, should I need to, through Quick Edit.
Attachments (1)
Change History (47)
#1
@
14 years ago
- Keywords needs-patch added
- Milestone changed from Awaiting Review to Future Release
#2
follow-up:
↓ 4
@
14 years ago
The whole code path for this is a mess. We call get_editable_user_ids() to determine whether we should display the post author meta box. We then call post_author_meta_box() where get_editable_user_ids() is called again. We now have two copies of this information in memory. From there we pass the IDs to wp_dropdown_users() which does another users query or uses more memory. Lame.
We need a proper get_users() function that can list users for a given blog (current blog by default) and return just IDs or entire user objects. get_users_of_blog() could call this and get_editable_user_ids() could be deprecated. WP_User_Search could use get_users() too. All places that query on user_level need to be removed. There's no need to treat multisite and single site differently when look for the users on a given blog. $prefix_capabilities should always be used.
#4
in reply to:
↑ 2
@
14 years ago
- Keywords gsoc added
Replying to ryan:
We need a proper get_users() function that can list users for a given blog (current blog by default) and return just IDs or entire user objects. get_users_of_blog() could call this and get_editable_user_ids() could be deprecated. WP_User_Search could use get_users() too. All places that query on user_level need to be removed. There's no need to treat multisite and single site differently when look for the users on a given blog. $prefix_capabilities should always be used.
Great; I have that baked into my GSoC project (Ajaxify Admin). I'll be opening a ticket with the patch soon.
#6
@
14 years ago
Awesome. Looking at wp_list_authors() and lots of other places that could benefit from a flexible get_users() to avoid multiple queries and plain dumb queries.
#14
@
14 years ago
From [15565] :
4234 4235 foreach ( $queries as $query ) { 4236 $meta_key = trim( @$query['meta_key'] ); 4237 $meta_value = trim( @$query['meta_value'] ); 4238 $meta_compare = @$query['meta_compare']; 4239 4240 if ( empty( $meta_compare ) || !in_array( $meta_compare, array( '=', '!=', '>', '>=', '<', '<=', 'like' ) ) ) 4241 $meta_compare = '='; 4242 4243 if ( empty( $meta_key ) ) 4244 continue; 4245
I don't think suppressing errors is the best way of the handling the arguments here. If $query['meta_key']
is an array, then $meta_key
will be set to the string "Array", which defeats the point of lines 4243ff. And that situation would be difficult to debug with no error messages.
In general, I would suggest that we suppress errors---if at all---only when dealing with external resources. For example, the call to PHPMailer's send in wp_mail
. That makes sense because a lot of the potential problems have to do with things outside of WP's purview, such as server configuration.
Here if we have a type mismatch we want to find out, or we want to check for it in the first place.
#15
@
14 years ago
Actually, I guess the type mismatch doesn't get suppressed, just when undefined. Still, I stand by my general point: don't use error suppression as a substitute for data validation.
#22
@
14 years ago
This is not a good query for a multisite with a large number of users. For example, make.wordpress.org/ui runs the P2 theme. The theme runs get_users_of_blog() on every blog-side page load during some content filters. That means on every page load, WP_User_Query gets run.
Old get_users_of_blog() query:
EXPLAIN SELECT user_id, user_id AS ID, user_login, display_name, user_email, meta_value FROM minibb_users, minibb_usermeta WHERE minibb_users.ID = minibb_usermeta.user_id AND meta_key = 'wporg_5_capabilities' ORDER BY minibb_usermeta.user_id
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
1 | SIMPLE | minibb_usermeta | ref | user_id,meta_key | meta_key | 256 | const | 10 | Using where; Using filesort |
1 | SIMPLE | minibb_users | eq_ref | PRIMARY | PRIMARY | 8 | wordpress.minibb_usermeta.user_id | 1 | Using where |
New WP_User_Query + _wp_meta_sql query:
EXPLAIN SELECT DISTINCT(minibb_users.ID) FROM minibb_users WHERE 1=1 AND minibb_users.ID IN ( SELECT user_id FROM minibb_usermeta WHERE CASE meta_key WHEN 'wporg_5_capabilities' THEN meta_value IS NOT NULL END GROUP BY user_id HAVING COUNT(*) = 1 ) ORDER BY user_login ASC; Rows: 2
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
1 | PRIMARY | minibb_users | index | NULL | user_login | 60 | NULL | 1283835 | Using where; Using index |
2 | DEPENDENT SUBQUERY | minibb_usermeta | index | NULL | user_id | 8 | NULL | 6165217 | Using where |
So we traded a filesort on 10 rows for a full table scan on both users and usermeta tables. Not worth it :)
#25
in reply to:
↑ 23
;
follow-up:
↓ 30
@
14 years ago
Replying to scribu:
Why is that a dependent subquery? I don't get it.
Because the MySQL optimizer is naive in this situation: http://dev.mysql.com/doc/refman/5.0/en/subquery-restrictions.html
You can force the inner query to be executed first by doing something like:
EXPLAIN SELECT DISTINCT(minibb_users.ID) FROM minibb_users WHERE 1=1 AND minibb_users.ID IN ( SELECT user_id FROM ( SELECT user_id FROM minibb_usermeta WHERE CASE meta_key WHEN 'wporg_5_capabilities' THEN meta_value IS NOT NULL END GROUP BY user_id HAVING COUNT(*) = 1 ) AS foo ) ORDER BY user_login ASC;
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
1 | PRIMARY | minibb_users | index | NULL | user_login | 60 | NULL | 1254992 | Using where; Using index |
2 | DEPENDENT SUBQUERY | <derived3> | ALL | NULL | NULL | NULL | NULL | 10 | Using where |
3 | DERIVED | minibb_usermeta | index | NULL | user_id | 8 | NULL | 5830942 | Using where |
The first subquery is still "dependent", but now it's only operating on a small number of rows the derived table returns.
A JOIN is simpler, though.
The other problem with this query is the usermeta portion (whether it's in a subquery or in a join).
EXPLAIN SELECT * FROM wp_usermeta WHERE CASE meta_key WHEN 'wp_capabilities' THEN meta_value IS NOT NULL END
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
1 | SIMPLE | wp_usermeta | ALL | NULL | NULL | NULL | NULL | 59 | Using where |
That can't be indexed. It has to look at each row to see if it matches one of (in this case, the only) case statement.
in _wp_meta_sql(), there's no way for the logic to result in multiple $clauses, so why not just go with:
EXPLAIN SELECT * FROM wp_usermeta WHERE meta_key = 'wp_capabilities'
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
1 | SIMPLE | wp_usermeta | ref | meta_key | meta_key | 768 | const | 2 | Using where |
Also, I don't understand the point of the GROUP BY user_id HAVING COUNT(*) = 1
. When would there ever be multiple wp_capabilities meta rows for a single user_id? Even if there are, why does it matter? That user_id would still be a member of the blog.
#26
@
14 years ago
PS: The last two EXPLAINs are done on a local install, not on the .org site. That's why the full table and desired rows show 59 and 2, respectively, instead of 6 million and 10 from the previous EXPLAINs
#27
follow-up:
↓ 29
@
14 years ago
in _wp_meta_sql(), there's no way for the logic to result in multiple $clauses
There is, when querying a role and an other arbitrary meta_key=>meta_value pair.
However, yeah, I should drop the CASE statement if there's only one clause.
#28
@
14 years ago
Also, I don't understand the point of the GROUP BY user_id HAVING COUNT(*) = 1.
It makes it so that only users matching all clauses are returned, but still only needed when having multiple clauses.
#29
in reply to:
↑ 27
@
14 years ago
Replying to scribu:
There is, when querying a role and an other arbitrary meta_key=>meta_value pair.
I don't know if we should have support for such dangerously suicidal queries. I imagine joining on usermeta multiple times will still have better performance.
#30
in reply to:
↑ 25
@
14 years ago
Replying to mdawaffe:
Because the MySQL optimizer is naive in this situation: http://dev.mysql.com/doc/refman/5.0/en/subquery-restrictions.html
That's grim.
You can force the inner query to be executed first by doing something like:
Couldn't you also get around the optimizer's correlated subquery by joining the results of the subquery instead?
SELECT DISTINCT(minibb_users.ID) FROM minibb_users JOIN ( SELECT user_id FROM minibb_usermeta WHERE CASE meta_key WHEN 'wporg_5_capabilities' THEN meta_value IS NOT NULL END GROUP BY user_id HAVING COUNT(*) = 1 ) AS userquery ON minibb_users.ID = userquery.user_id ORDER BY user_login ASC
#31
@
14 years ago
I don't know if we should have support for such dangerously suicidal queries.
Let's not exagerate. The API only allows a single meta query, besides the role checking.
Couldn't you also get around the optimizer's correlated subquery by joining the results of the subquery instead?
You could, but again, it wouldn't make use of the meta_key index.
Nacin is right. Multiple JOINs are best:
EXPLAIN SELECT wp_users.ID FROM wp_users INNER JOIN wp_usermeta ON (wp_users.ID = wp_usermeta.user_id) INNER JOIN wp_usermeta AS mt1 ON (wp_users.ID = mt1.user_id) WHERE 1=1 AND wp_usermeta.meta_key = 'wp_capabilities' AND mt1.meta_key = 'admin_color' AND mt1.meta_value = 'fresh' ORDER BY user_login ASC
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
1 | SIMPLE | wp_usermeta | ref | user_id,meta_key | meta_key | 768 | const | 7 | Using where; Using temporary; Using filesort |
1 | SIMPLE | wp_users | eq_ref | PRIMARY | PRIMARY | 8 | wp-trunk.wp_usermeta.user_id | 1 | |
1 | SIMPLE | mt1 | ref | user_id,meta_key | meta_key | 768 | const | 9 | Using where |
#34
in reply to:
↑ 33
@
14 years ago
Replying to scribu:
(In [15581]) Use _wp_meta_sql() in WP_Query. See #14572. See #14645
I'm not 100% this is an enhancement when dealing with multiple criteria. SQL optimizers tend to go nuts once the number of joins is high, picking a random plan among several options, and then working out which one is the better one through a genetic algorithm.
This:
SELECT DISTINCT(minibb_users.ID) FROM minibb_users JOIN ( SELECT user_id FROM minibb_usermeta WHERE CASE meta_key WHEN 'wporg_5_capabilities' THEN meta_value IS NOT NULL END GROUP BY user_id HAVING COUNT(*) = 1 ) AS userquery ON minibb_users.ID = userquery.user_id ORDER BY user_login ASC
... has a group by clause in the subquery, which very much guarantees that the query rewriter will never rewrite the query such as the following:
SELECT DISTINCT(minibb_users.ID) FROM minibb_users JOIN minibb_usermeta ON minibb_users.ID = minibb_usermeta.user_id AND CASE minibb_usermeta.meta_key WHEN 'wporg_5_capabilities' THEN minibb_usermeta.meta_value IS NOT NULL -- extra WHEN clauses can go here END GROUP BY user_id HAVING COUNT(*) = 1 ORDER BY user_login ASC
The above rewrite should allow to do a single join on an indexed column (supposedly user_id is indexed in usermeta? if not, it needs to be), rather than as many joins as there are criteria.
The group by and having clauses also can be dropped entirely, when a unique meta criteria is considered. As can the CASE statement:
SELECT DISTINCT(minibb_users.ID) FROM minibb_users JOIN minibb_usermeta ON minibb_users.ID = minibb_usermeta.user_id AND minibb_usermeta.meta_key = 'wporg_5_capabilities' AND minibb_usermeta.meta_value IS NOT NULL ORDER BY user_login ASC
Other aside, instead of minibb_usermeta.meta_value IS NOT NULL, we could also write TRUE, since the column is never null anyway.
#35
@
14 years ago
Oh, and... the DISTINCT definitely seems useless in each of those three statements. The two first have a GROUP BY clause that does the same thing, and the last one's inner join ensures there can only be one.
#36
in reply to:
↑ 33
;
follow-up:
↓ 37
@
14 years ago
Replying to scribu:
(In [15581]) Use _wp_meta_sql() in WP_Query. See #14572. See #14645
With define('WP_DEBUG', true); we get PHP Notice:
Notice: Uninitialized string offset: 1 in {path-to}\wp-includes\query.php on line 2221
Notice: Uninitialized string offset: 0 in {path-to}\wp-includes\query.php on line 2221
#37
in reply to:
↑ 36
@
14 years ago
Replying to azizur:
With define('WP_DEBUG', true); we get PHP Notice:
I can confirm that at revision: 15599 this issue is no longer there.
#40
@
14 years ago
- Resolution set to fixed
- Status changed from assigned to closed
I think we can call this fixed. Please open new tickets if necessary.
#43
@
14 years ago
- Priority changed from normal to highest omg bbq
- Resolution fixed deleted
- Severity changed from normal to blocker
- Status changed from closed to reopened
Using wp_dropdown_users with a lot of users causes out of memory fatal errors. 8,300 users on 3.0.3 works fine, but crashes in 3.1 for 'only' 7,000 users. Seems to be running out of memory whilst trying to cache all the retrieved users.
Related: #15786
#44
@
14 years ago
- Keywords has-patch added; needs-patch gsoc removed
fields.14572.diff extends the 'fields' parameter to an array, so that wp_dropdown_users() can get only the field needed.
That makes sense. wp_dropdown_users() should be using WP_User_Search anyway, so two birds with one stone.