#9864 closed defect (bug) (invalid)
Performance issues with large number of pages
Reported by: | pp19dd | Owned by: | |
---|---|---|---|
Milestone: | Priority: | normal | |
Severity: | normal | Version: | 2.7.1 |
Component: | Posts, Post Types | Keywords: | 2nd-opinion has-patch |
Focuses: | performance, administration | Cc: |
Description
Environment:
- Default install, default theme, no plugins, reasonable server configuration (typical execution, memory limits). In my particular case, that was set to 30 seconds, 384 MB.
To reproduce:
- Create a large number of pages (5,000+)
- Try to edit a page
What happens:
- Maximum execution time error
Fatal error: Maximum execution time of 30 seconds exceeded in \wp-includes\post.php on line 1998
Workaround to prevent error:
- set_time_limit to 0 as server admin
My diagnosis:
Suspect that the immediate problem is that get_pages() inside wp-admin/post.php, line 2173, performs a query that selects EVERYTHING:
$query = "SELECT * FROM $wpdb->posts $join WHERE (post_type = 'page' AND post_status = 'publish') $where ";
Now, in this case this query (AFAIK) is just supposed to pull up the page list (with IDs, titles and parents). What it actually does is it pulls up ALL of the post data. This works fine for low number of pages, but on a larger data set this causes a chokepoint. After all, the end result is just supposed to be a dropdown with a list of pages.
Workaround to improve performance:
- comment out line 273 in wp-admin\edit-page-form.php and skip displaying the page parent in the edit screen
The bigger picture:
Well, the ENTIRE list of pages is queried in this 'edit page' screen (see screenshot # 2 - with a dropdown that contains 5,000+ entries.) I think this is a design limitation and it looks clumsy once the list gets larger. Rather than patch the display routines, I think this thing needs to be replaced with something that can search / paginate piecemeal server-side.
Attachments (4)
Change History (59)
#2
@
16 years ago
- Component changed from Administration to Performance
- Milestone changed from Unassigned to Future Release
- Owner set to Denis-de-Bernardy
yep. quite a few times, even.
#3
@
16 years ago
the query is:
SELECT * FROM wp_posts WHERE (post_type = 'page' AND post_status = 'publish') ORDER BY menu_order, post_title ASC
please alter your database, and add an index on (post_type, post_status). see whether that makes any difference.
#4
@
16 years ago
That query appears to never use any of the current indexes. It's the ORDER BY clause that is causing the indexes to not be usable. Perhaps it's worth leaving the ORDER BY off and sorting the results in PHP, outside of the DB.
#5
follow-up:
↓ 25
@
16 years ago
php won't beat an index if one is used. especially if we're talking 3k page lists. that'll consume huge amounts of memory.
an index on post_type, menu_order, post_title may do the trick, but at the end of the day it goes down to the amount of clutter that is present in the database. if the number of posts, revisions and attachments is small, and the query planner decides it's faster to seq scan the mess and sort it without an index, it will...
anyway, see #9642
#6
follow-up:
↓ 7
@
16 years ago
Thanks. I was unable to find related tickets, but thanks to the references I see this is being looked at. I added an index to the suggested fields and in this case it really made no difference.
The question that I have is -- why would we want to list all 5,000+ pages in a dropdown on the 'edit page' screen? It would seem prudent to put up a paginated search widget there. I mean, its purpose is to assign a parent to the page.
I don't think this is a question of optimization as much as "why are we querying this at all?" (page_edit_eyesore.png shows a dropdown that's had its scroll bar pushed off the screen - but - more importantly it shows the entire 5,000 page listing.)
At any rate, I think that you might have covered this under another patch. I will keep looking around for an upcoming solution - otherwise - I'll manually disable that admin feature.
#7
in reply to:
↑ 6
@
16 years ago
Replying to pp19dd:
The question that I have is -- why would we want to list all 5,000+ pages in a dropdown on the 'edit page' screen? It would seem prudent to put up a paginated search widget there. I mean, its purpose is to assign a parent to the page.
Simple: because there is no patch to do things differently, and that would be the simplest way to proceed in order to assign a page parent.
Imo, it would be better if a simple text field existed. As the user types a title in it, we'd ajax query page titles and return a small selection of pages to choose from. That would be great, and it would scale admirably. If you're familiar with php and js, go right ahead and write the patch. If not, it may be ages before it gets fixed if your DB is so that the indexes are not used.
#8
@
16 years ago
Denis- thanks for the suggestion. I wonder if I get it done quickly, will make it into 2.8?
#9
@
16 years ago
Quite honestly I doubt it. We're in beta, which is another word for feature freeze. It would surely go into an early 2.9 if a patch is around, however.
Then again, I've no commit access, so this would really be a question for one of Ryan, Andrew, Mark or Peter.
#10
@
16 years ago
As I thought, I raised similar issues a while back as #8468 and #8492
Essentially, this problem exists whereever we have a drop down of potentially infinite length. Some sort of logic should be used to limit records that are returned to a pull-down menu. Whether that means that for large numbers of records they have to be LIMITed and split by some criteria, I am not sure.
#12
@
16 years ago
@mrmist: the thing to do, I think, would be the following:
A simple text field. As the user types a title in it, we'd ajax query page titles and return a small selection of pages to choose from.
Now someone needs to volunteer the needed ajax in a generic enough way. ;-)
#13
@
16 years ago
I'll do some wishful thinking (that the fix would magically end up in 2.8) and give it a try in the next day. Think that I can do a fairly unobtrusive patch that would give us that capability. Question is whether to work from 2.7.1 or the trunk. What do you guys think? (Word had it that the trunk had updated "optimized" queries.)
#14
@
16 years ago
Add this index to your wp_posts table:
(post_type, menu_order, post_title)
And then use trunk. Not 2.7.1.
Intuitively, there is a need for:
- a new case in wp-admin/admin-ajax.php (e.g. quickfind) that tosses an empty array into a filter with a parameter (pages, in our case), so $results = apply_filters(quickfind, array(), pages)
- some kind of function that returns the needed array using a db query
- the needed UI functions in wp-admin/includes/template.php
- possibly some edits to wp-admin/js/pages.dev.js (use the SCRIPT_DEBUG define to use that one)
#15
@
16 years ago
another potential option could be a simple index on (post_type, post_title) and an unordered query:
select * from $wpdb->posts where post_type = page and post_title like foo%
that *should* use the index as well, and it might be faster than the previous index. (string comparison functions conveniently happen to be case insensitive in mysql.)
#19
@
15 years ago
What if the functionality of this metabox just degraded nicer? We could add some handling that said if the number of pages were over say 200, then instead of doing the normal drop down box, we use pop up style selector similar to the way the media_upload browser is used for the post image handling.
This would allow smaller sites to continue to make quick and easy selection like they do now while keeping it from breaking completely for larger sites.
#21
@
15 years ago
- Milestone changed from 2.9 to Future Release
Punting b/c we're in beta 2 and this needs an updated patch.
#25
in reply to:
↑ 5
;
follow-up:
↓ 26
@
12 years ago
Replying to Denis-de-Bernardy:
an index on post_type, menu_order, post_title may do the trick, but at the end of the day it goes down to the amount of clutter that is present in the database. if the number of posts, revisions and attachments is small, and the query planner decides it's faster to seq scan the mess and sort it without an index, it will...
Is there any reason why 'menu_order' doesn't have an index?
It strikes me the primary use for 'menu_order' would be to sort results so surely having an index on this would help?
I'll caveat this by saying I'm no database expert :)
#26
in reply to:
↑ 25
;
follow-up:
↓ 27
@
12 years ago
Replying to husobj:
Replying to Denis-de-Bernardy:
an index on post_type, menu_order, post_title may do the trick (...)
Is there any reason why 'menu_order' doesn't have an index?
It strikes me the primary use for 'menu_order' would be to sort results so surely having an index on this would help?
I'll caveat this by saying I'm no database expert :)
An index on menu_order just by itself would never get used. The only case where such an index would get considered is when it's 1) selective enough (but here, an index on type/status will likely beat it, since it would be an index where most values are 0) and/or 2) it returns rows in the correct order (which it doesn't, since we order by order/title).
On a normal blog with lots of posts and only a handful of pages, the query engine would likely plan to grab published pages using an index on type/status, sort them, and return the result. The index I suggested would, in this case, strip out the sorting step on most normal sites. If, in contrast, a site has lots of pages and little to no posts, the optimal query plan would be to avoid the index entirely, read the whole table followed by a sort -- because in memory sorting the whole table will trounce random disk reads.
In so far as I remember from two years ago, though, this seemed to be primarily a throughput issue, rather than a DB indexing issue. Things time out either because it takes too long to move the pages from the query results to PHP from the DB, or because it takes too long for WP to do its things once they're retrieved, or because it take too long to output said things once it's done, or something else. Someone would need to go in there and measure, to identify where time is actually being spent.
#27
in reply to:
↑ 26
@
12 years ago
Replying to Denis-de-Bernardy:
Thanks for your in-depth reply.
I suppose the use case I am thinking of is when you have many pages (or custom post types) and you use a plugin like CMS Tree Page View or other plugin which allows you to set the order of posts by drag-n-drop. General these tend to set a different menu_order value for every post - is that what you mean above when you say "selective enough"?
So that's fine if you have an average quantity of pages, but say you have loads of pages or an e-commerce store with thousands of products which users can order in a custom way by menu_order. In that instance would an index on menu_order be beneficial?
#28
follow-up:
↓ 29
@
12 years ago
Potentially beneficial for selectivity
where post_status = 'published' and menu_order > 0 -- eliminates posts, which have menu_order set to zero
Potentially beneficial for ordering:
order by menu_order
Not beneficial for ordering, unless the index is on menu_order, post_title:
order by menu_order, post_title
But in the latter case it needs to be selective as well, else, the query planner may decide its more efficient to grab all rows using a smaller, more selective index (on post_type or post_type, post_status), hence my initial suggesting to try one on post_type, menu_order, post_title -- potentiall selective and in the right order.
Caveat: if using the index means randomly moving back and forth over disk pages all over the place, the planner might still (correctly) prefer a sequential scan with an in-memory sort. Using a clustered index would work around this, but you wouldn't want to go there, since the current clustered index (the id) more or less matches the order we use to run blog-related queries.
Anyway... I hope it clarifies your thoughts. At the end of the day, this is really an issue at the php level imho, rather than an index problem.
#29
in reply to:
↑ 28
@
12 years ago
Replying to Denis-de-Bernardy:
Thanks for taking the time to explain.
#30
@
11 years ago
- Component changed from Performance to Administration
- Focuses performance added
- Milestone changed from Future Release to 3.9
Would love to fix this by moving to an auto-complete solution, akin to #19867 for users.
#31
@
11 years ago
Thinking through this a bit - since autocomplete results are flattened out do we need to / how should we show any hierarchy? Add parent page name somewhere in the result? Something else?
This ticket was mentioned in IRC in #wordpress-dev by helen. View the logs.
11 years ago
#34
@
11 years ago
- Component changed from Administration to Posts, Post Types
- Focuses administration added
This ticket was mentioned in IRC in #wordpress-dev by nacin. View the logs.
11 years ago
This ticket was mentioned in IRC in #wordpress-dev by helen. View the logs.
11 years ago
This ticket was mentioned in IRC in #wordpress-dev by JustinSainton_. View the logs.
11 years ago
This ticket was mentioned in IRC in #wordpress-dev by DH-Shredder. View the logs.
11 years ago
#40
@
11 years ago
- Keywords 2nd-opinion has-patch added
Attached is a first try with this. Had something done, but when I saw the last patch from @helen at #19867 I changed it to also use autocomplete only with page number > 100. We still need to do something with indexes or selecting just some fields, because get_pages is still taking long time when there are lots of pages.
#41
@
11 years ago
Btw, almost forgot. For displaying the parent/child pages I was thinking on something like this example: http://jqueryui.com/autocomplete/#categories
#42
@
11 years ago
Haven't tested 9864.diff yet, but it has some i18n issues.
/* translators: 1: post_title */ 'id' => $page->{$field}, 'label' => sprintf( __('%1$s'), $page->post_title ), 'value' => sprintf( __('%1$s'), $page->post_title )
Dynamic strings like these are not localizable, as they cannot be extracted by gettext:
http://ottopress.com/2012/internationalization-youre-probably-doing-it-wrong/
Why would we need to translate a post title?
We do, however need to use esc_attr__()
instead of just esc_attr()
here:
$output = "<input name='{$name}' type='text' ... title='" . esc_attr( 'Page Title' ) . "' ... />\n";
This ticket was mentioned in IRC in #wordpress-dev by oso96_2000. View the logs.
11 years ago
#44
@
11 years ago
- Milestone changed from 3.9 to Future Release
See: #19867. Sad, because I hate punting. But we'll get there.
This ticket was mentioned in IRC in #wordpress-dev by helen. View the logs.
10 years ago
This ticket was mentioned in Slack in #design by helen. View the logs.
10 years ago
#49
follow-up:
↓ 50
@
7 years ago
I have 3 questions before making patches.
UI enhancement
WordPress uses select element to "select parent" UI. But it may raises performance issue if target option elements increase.
- Parent page(also, hierarchical post types)
- Parent term(also, hierarchical taxonomies)
SQL optimization doesn't solve these, because parent candidates keep increasing.
So I recommend incremental select, but is it beyond scope of this ticket?
Library to use
I think we should choose some library to Candidates.
- Select2
- SelectWoo
- jQuery UI autocomplete
I prefer select2 https://github.com/select2/select2 but core team tried to include and nothing updated https://github.com/select2/select2/issues/3744
WooCommerce team made SelectWoo and I think this might be good https://github.com/woocommerce/selectwoo
jQuery UI autocomplete is included in core but not similar to original select element.
What should we choose?
Resulting UI
Current select element displays option hierarchically, but select2 or something like autocomplete don't display "non-match parent.
Considering its purpose of autocomplete UI, non-matching elements should be displayed, but someone may think its worse UX than before.
Should we show non-match parent elements?
Thanks in advance.
#50
in reply to:
↑ 49
@
7 years ago
Replying to Takahashi_Fumiki:
So I recommend incremental select, but is it beyond scope of this ticket?
There are also some considerations in comment:32:ticket:19867.
#54
@
2 years ago
- Milestone Future Release deleted
- Resolution set to invalid
- Status changed from assigned to closed
The parent page selector was replaced with an accessible autocomplete search selector in Gutenberg, see https://github.com/WordPress/gutenberg/pull/25267. Closing this ticket as invalid. Please re-open if I have missed something.
Fairly sure that the scalability with lots of pages has already been reported.