WordPress.org

Make WordPress Core

Opened 5 years ago

Last modified 3 weeks ago

#9864 assigned defect (bug)

Performance issues with large number of pages

Reported by: pp19dd Owned by:
Milestone: Future Release Priority: normal
Severity: normal Version: 2.7.1
Component: Posts, Post Types Keywords: 2nd-opinion has-patch
Focuses: performance, administration Cc:

Description

Environment:

  • Default install, default theme, no plugins, reasonable server configuration (typical execution, memory limits). In my particular case, that was set to 30 seconds, 384 MB.

To reproduce:

  • Create a large number of pages (5,000+)
  • Try to edit a page

What happens:

  • Maximum execution time error
Fatal error: Maximum execution time of 30 seconds exceeded in \wp-includes\post.php on line 1998

Workaround to prevent error:

  • set_time_limit to 0 as server admin

My diagnosis:

Suspect that the immediate problem is that get_pages() inside wp-admin/post.php, line 2173, performs a query that selects EVERYTHING:

$query = "SELECT * FROM $wpdb->posts $join WHERE (post_type = 'page' AND post_status = 'publish') $where ";

Now, in this case this query (AFAIK) is just supposed to pull up the page list (with IDs, titles and parents). What it actually does is it pulls up ALL of the post data. This works fine for low number of pages, but on a larger data set this causes a chokepoint. After all, the end result is just supposed to be a dropdown with a list of pages.

Workaround to improve performance:

  • comment out line 273 in wp-admin\edit-page-form.php and skip displaying the page parent in the edit screen

The bigger picture:

Well, the ENTIRE list of pages is queried in this 'edit page' screen (see screenshot # 2 - with a dropdown that contains 5,000+ entries.) I think this is a design limitation and it looks clumsy once the list gets larger. Rather than patch the display routines, I think this thing needs to be replaced with something that can search / paginate piecemeal server-side.

Attachments (4)

page_edit_error.png (8.4 KB) - added by pp19dd 5 years ago.
page_edit_eyesore.png (8.9 KB) - added by pp19dd 5 years ago.
9864.diff (8.4 KB) - added by oso96_2000 8 weeks ago.
9864.2.diff (8.3 KB) - added by oso96_2000 8 weeks ago.
Thanks for the feedback, SergeyBiryukov

Download all attachments as: .zip

Change History (48)

pp19dd5 years ago

pp19dd5 years ago

comment:1 mrmist5 years ago

Fairly sure that the scalability with lots of pages has already been reported.

comment:2 Denis-de-Bernardy5 years ago

  • Component changed from Administration to Performance
  • Milestone changed from Unassigned to Future Release
  • Owner set to Denis-de-Bernardy

yep. quite a few times, even.

comment:3 Denis-de-Bernardy5 years ago

the query is:

SELECT * FROM wp_posts WHERE (post_type = 'page' AND post_status = 'publish') ORDER BY menu_order, post_title ASC

please alter your database, and add an index on (post_type, post_status). see whether that makes any difference.

comment:4 josephscott5 years ago

That query appears to never use any of the current indexes. It's the ORDER BY clause that is causing the indexes to not be usable. Perhaps it's worth leaving the ORDER BY off and sorting the results in PHP, outside of the DB.

comment:5 follow-up: Denis-de-Bernardy5 years ago

php won't beat an index if one is used. especially if we're talking 3k page lists. that'll consume huge amounts of memory.

an index on post_type, menu_order, post_title may do the trick, but at the end of the day it goes down to the amount of clutter that is present in the database. if the number of posts, revisions and attachments is small, and the query planner decides it's faster to seq scan the mess and sort it without an index, it will...

anyway, see #9642

comment:6 follow-up: pp19dd5 years ago

Thanks. I was unable to find related tickets, but thanks to the references I see this is being looked at. I added an index to the suggested fields and in this case it really made no difference.

The question that I have is -- why would we want to list all 5,000+ pages in a dropdown on the 'edit page' screen? It would seem prudent to put up a paginated search widget there. I mean, its purpose is to assign a parent to the page.

I don't think this is a question of optimization as much as "why are we querying this at all?" (page_edit_eyesore.png shows a dropdown that's had its scroll bar pushed off the screen - but - more importantly it shows the entire 5,000 page listing.)

At any rate, I think that you might have covered this under another patch. I will keep looking around for an upcoming solution - otherwise - I'll manually disable that admin feature.

comment:7 in reply to: ↑ 6 Denis-de-Bernardy5 years ago

Replying to pp19dd:

The question that I have is -- why would we want to list all 5,000+ pages in a dropdown on the 'edit page' screen? It would seem prudent to put up a paginated search widget there. I mean, its purpose is to assign a parent to the page.

Simple: because there is no patch to do things differently, and that would be the simplest way to proceed in order to assign a page parent.

Imo, it would be better if a simple text field existed. As the user types a title in it, we'd ajax query page titles and return a small selection of pages to choose from. That would be great, and it would scale admirably. If you're familiar with php and js, go right ahead and write the patch. If not, it may be ages before it gets fixed if your DB is so that the indexes are not used.

comment:8 pp19dd5 years ago

Denis- thanks for the suggestion. I wonder if I get it done quickly, will make it into 2.8?

comment:9 Denis-de-Bernardy5 years ago

Quite honestly I doubt it. We're in beta, which is another word for feature freeze. It would surely go into an early 2.9 if a patch is around, however.

Then again, I've no commit access, so this would really be a question for one of Ryan, Andrew, Mark or Peter.

comment:10 mrmist5 years ago

As I thought, I raised similar issues a while back as #8468 and #8492

Essentially, this problem exists whereever we have a drop down of potentially infinite length. Some sort of logic should be used to limit records that are returned to a pull-down menu. Whether that means that for large numbers of records they have to be LIMITed and split by some criteria, I am not sure.

comment:11 mrmist5 years ago

#8648 rather.

comment:12 Denis-de-Bernardy5 years ago

@mrmist: the thing to do, I think, would be the following:

A simple text field. As the user types a title in it, we'd ajax query page titles and return a small selection of pages to choose from.

Now someone needs to volunteer the needed ajax in a generic enough way. ;-)

comment:13 pp19dd5 years ago

I'll do some wishful thinking (that the fix would magically end up in 2.8) and give it a try in the next day. Think that I can do a fairly unobtrusive patch that would give us that capability. Question is whether to work from 2.7.1 or the trunk. What do you guys think? (Word had it that the trunk had updated "optimized" queries.)

comment:14 Denis-de-Bernardy5 years ago

Add this index to your wp_posts table:

(post_type, menu_order, post_title)

And then use trunk. Not 2.7.1.

Intuitively, there is a need for:

  • a new case in wp-admin/admin-ajax.php (e.g. quickfind) that tosses an empty array into a filter with a parameter (pages, in our case), so $results = apply_filters(quickfind, array(), pages)
  • some kind of function that returns the needed array using a db query
  • the needed UI functions in wp-admin/includes/template.php
  • possibly some edits to wp-admin/js/pages.dev.js (use the SCRIPT_DEBUG define to use that one)

comment:15 Denis-de-Bernardy5 years ago

another potential option could be a simple index on (post_type, post_title) and an unordered query:

select * from $wpdb->posts where post_type = page and post_title like foo%

that *should* use the index as well, and it might be faster than the previous index. (string comparison functions conveniently happen to be case insensitive in mysql.)

comment:16 Denis-de-Bernardy5 years ago

  • Milestone changed from Future Release to 2.9

comment:17 Denis-de-Bernardy5 years ago

  • Keywords 2nd-opinion removed

comment:18 Denis-de-Bernardy5 years ago

  • Owner Denis-de-Bernardy deleted
  • Status changed from new to assigned

comment:19 prettyboymp4 years ago

What if the functionality of this metabox just degraded nicer? We could add some handling that said if the number of pages were over say 200, then instead of doing the normal drop down box, we use pop up style selector similar to the way the media_upload browser is used for the post image handling.

This would allow smaller sites to continue to make quick and easy selection like they do now while keeping it from breaking completely for larger sites.

comment:20 prettyboymp4 years ago

  • Cc mpretty@… added

comment:21 janeforshort4 years ago

  • Milestone changed from 2.9 to Future Release

Punting b/c we're in beta 2 and this needs an updated patch.

comment:22 hakre3 years ago

Duplicate: #16851

comment:23 jkudish19 months ago

  • Cc joachim.kudish@… added

comment:24 husobj14 months ago

  • Cc ben@… added

comment:25 in reply to: ↑ 5 ; follow-up: husobj13 months ago

Replying to Denis-de-Bernardy:

an index on post_type, menu_order, post_title may do the trick, but at the end of the day it goes down to the amount of clutter that is present in the database. if the number of posts, revisions and attachments is small, and the query planner decides it's faster to seq scan the mess and sort it without an index, it will...

Is there any reason why 'menu_order' doesn't have an index?
It strikes me the primary use for 'menu_order' would be to sort results so surely having an index on this would help?
I'll caveat this by saying I'm no database expert :)

comment:26 in reply to: ↑ 25 ; follow-up: Denis-de-Bernardy13 months ago

Replying to husobj:

Replying to Denis-de-Bernardy:

an index on post_type, menu_order, post_title may do the trick (...)

Is there any reason why 'menu_order' doesn't have an index?
It strikes me the primary use for 'menu_order' would be to sort results so surely having an index on this would help?
I'll caveat this by saying I'm no database expert :)

An index on menu_order just by itself would never get used. The only case where such an index would get considered is when it's 1) selective enough (but here, an index on type/status will likely beat it, since it would be an index where most values are 0) and/or 2) it returns rows in the correct order (which it doesn't, since we order by order/title).

On a normal blog with lots of posts and only a handful of pages, the query engine would likely plan to grab published pages using an index on type/status, sort them, and return the result. The index I suggested would, in this case, strip out the sorting step on most normal sites. If, in contrast, a site has lots of pages and little to no posts, the optimal query plan would be to avoid the index entirely, read the whole table followed by a sort -- because in memory sorting the whole table will trounce random disk reads.

In so far as I remember from two years ago, though, this seemed to be primarily a throughput issue, rather than a DB indexing issue. Things time out either because it takes too long to move the pages from the query results to PHP from the DB, or because it takes too long for WP to do its things once they're retrieved, or because it take too long to output said things once it's done, or something else. Someone would need to go in there and measure, to identify where time is actually being spent.

comment:27 in reply to: ↑ 26 husobj13 months ago

Replying to Denis-de-Bernardy:

Thanks for your in-depth reply.

I suppose the use case I am thinking of is when you have many pages (or custom post types) and you use a plugin like CMS Tree Page View or other plugin which allows you to set the order of posts by drag-n-drop. General these tend to set a different menu_order value for every post - is that what you mean above when you say "selective enough"?

So that's fine if you have an average quantity of pages, but say you have loads of pages or an e-commerce store with thousands of products which users can order in a custom way by menu_order. In that instance would an index on menu_order be beneficial?

comment:28 follow-up: Denis-de-Bernardy13 months ago

Potentially beneficial for selectivity

where post_status = 'published' and menu_order > 0 -- eliminates posts, which have menu_order set to zero


Potentially beneficial for ordering:

order by menu_order

Not beneficial for ordering, unless the index is on menu_order, post_title:

order by menu_order, post_title

But in the latter case it needs to be selective as well, else, the query planner may decide its more efficient to grab all rows using a smaller, more selective index (on post_type or post_type, post_status), hence my initial suggesting to try one on post_type, menu_order, post_title -- potentiall selective and in the right order.

Caveat: if using the index means randomly moving back and forth over disk pages all over the place, the planner might still (correctly) prefer a sequential scan with an in-memory sort. Using a clustered index would work around this, but you wouldn't want to go there, since the current clustered index (the id) more or less matches the order we use to run blog-related queries.

Anyway... I hope it clarifies your thoughts. At the end of the day, this is really an issue at the php level imho, rather than an index problem.

comment:29 in reply to: ↑ 28 husobj13 months ago

Replying to Denis-de-Bernardy:

Thanks for taking the time to explain.

comment:30 nacin3 months ago

  • Component changed from Performance to Administration
  • Focuses performance added
  • Milestone changed from Future Release to 3.9

Would love to fix this by moving to an auto-complete solution, akin to #19867 for users.

comment:31 helen3 months ago

Thinking through this a bit - since autocomplete results are flattened out do we need to / how should we show any hierarchy? Add parent page name somewhere in the result? Something else?

comment:32 nacin3 months ago

#8492 was marked as a duplicate.

comment:33 ircbot3 months ago

This ticket was mentioned in IRC in #wordpress-dev by helen. View the logs.

comment:34 nacin3 months ago

  • Component changed from Administration to Posts, Post Types
  • Focuses administration added

comment:35 ircbot2 months ago

This ticket was mentioned in IRC in #wordpress-dev by nacin. View the logs.

comment:36 ircbot2 months ago

This ticket was mentioned in IRC in #wordpress-dev by helen. View the logs.

comment:37 ircbot2 months ago

This ticket was mentioned in IRC in #wordpress-dev by JustinSainton_. View the logs.

comment:39 ircbot2 months ago

This ticket was mentioned in IRC in #wordpress-dev by DH-Shredder. View the logs.

oso96_20008 weeks ago

comment:40 oso96_20008 weeks ago

  • Keywords 2nd-opinion has-patch added

Attached is a first try with this. Had something done, but when I saw the last patch from @helen at #19867 I changed it to also use autocomplete only with page number > 100. We still need to do something with indexes or selecting just some fields, because get_pages is still taking long time when there are lots of pages.

comment:41 oso96_20008 weeks ago

Btw, almost forgot. For displaying the parent/child pages I was thinking on something like this example: http://jqueryui.com/autocomplete/#categories

comment:42 SergeyBiryukov8 weeks ago

Haven't tested 9864.diff yet, but it has some i18n issues.

/* translators: 1: post_title */
'id'    => $page->{$field},
'label' => sprintf( __('%1$s'), $page->post_title ),
'value' => sprintf( __('%1$s'), $page->post_title )

Dynamic strings like these are not localizable, as they cannot be extracted by gettext:
http://ottopress.com/2012/internationalization-youre-probably-doing-it-wrong/

Why would we need to translate a post title?

We do, however need to use esc_attr__() instead of just esc_attr() here:

$output = "<input name='{$name}' type='text' ... title='" . esc_attr( 'Page Title' ) . "' ... />\n";
Last edited 8 weeks ago by SergeyBiryukov (previous) (diff)

comment:43 ircbot8 weeks ago

This ticket was mentioned in IRC in #wordpress-dev by oso96_2000. View the logs.

oso96_20008 weeks ago

Thanks for the feedback, SergeyBiryukov

comment:44 helen3 weeks ago

  • Milestone changed from 3.9 to Future Release

See: #19867. Sad, because I hate punting. But we'll get there.

Note: See TracTickets for help on using tickets.