Make WordPress Core

Opened 14 years ago

Closed 10 years ago

Last modified 10 years ago

#15459 closed task (blessed) (fixed)

Need Better Page Hierarchy Display Algorithm

Reported by: truthmedia's profile truthmedia Owned by: johnbillion's profile johnbillion
Milestone: 4.2 Priority: normal
Severity: normal Version: 3.0.1
Component: Posts, Post Types Keywords: has-patch commit
Focuses: performance, administration Cc:

Description

WordPress 3.0.1

File: /wp-admin/includes/post.php

Lines: 904-912

Code:

	// Hierarchical types require special args.
	if ( is_post_type_hierarchical( $post_type ) ) {
		$query['orderby'] = 'menu_order title';
		$query['order'] = 'asc';
		$query['posts_per_page'] = -1;
		$query['posts_per_archive_page'] = -1;
	}

	wp( $query );

The code in the area mentioned above causes our site to use over 170mb of memory on each load of the Pages index. This really should be improved to accommodate sites that have large amounts of hierarchical data. Looks to me like it's pulling all the data for all the pages on the site. Perhaps a solution where unnecessary data is not included might be better here (such as content or excerpt).

Attachments (6)

15459.diff (2.7 KB) - added by nacin 13 years ago.
15459.2.diff (4.9 KB) - added by rodrigosprimo 11 years ago.
15459.3.diff (4.9 KB) - added by rodrigosprimo 10 years ago.
15459.4.diff (4.8 KB) - added by nofearinc 10 years ago.
fix the ID fetching if an object reference exists
15459.5.diff (7.1 KB) - added by rodrigosprimo 10 years ago.
Add basic test coverage
15459.6.diff (9.4 KB) - added by johnbillion 10 years ago.

Download all attachments as: .zip

Change History (57)

#1 @scribu
14 years ago

  • Keywords 3.2-early added; memory pages hierarchy removed
  • Milestone changed from Awaiting Review to Future Release
  • Type changed from defect (bug) to enhancement

For taxonomies, we store the tree of IDs and then query based on that. We could do the same for hierarchical post types.

#2 @scribu
14 years ago

  • Keywords 3.2-early removed

This would require quite a bit of refactoring; no patch in sight.

#3 @SergeyBiryukov
13 years ago

  • Keywords needs-patch added

#4 @duck_
13 years ago

Closed #19285 as dupe.

Related change sets [3564], [3727].

#6 @nacin
13 years ago

Here's a quick patch that breaks all over the place but could be a good starting point for someone wishing to solve this.

Essentially, here's what we currently do:

  • Query all posts
  • Group them based on ID and post_parent
  • Begin rendering posts until we've reached the total to display per page

What we should do instead:

  • Query all IDs and post parents
  • Group them
  • Figure out what posts we need to grab to display the right number on a page
  • Use _prime_post_caches() to pull those specific posts into cache
  • Render those posts

Querying all IDs and parents still isn't ideal, but it's a step in the right direction. Without MPTT this is not easy.

@nacin
13 years ago

#7 @nacin
13 years ago

Here's a quick patch that breaks all over the place but could be a good starting point for someone wishing to solve this.

A note, it's a total duck punch. The idea is a good starting point. The code is not.

#8 @JustinSainton
13 years ago

Certainly quite dated, but I wonder if anything ever happened here or if anything can be salvaged -

http://gsoc2009wp.wordpress.com/tag/mptt/

#9 @batmoo
12 years ago

  • Cc batmoo@… added

#10 @SergeyBiryukov
11 years ago

#15052 was marked as a duplicate.

#12 @nickdaugherty
11 years ago

  • Cc ndaugherty987@… added

#14 @SergeyBiryukov
11 years ago

#26326 was marked as a duplicate.

#15 @rodrigosprimo
11 years ago

  • Cc rodrigosprimo@… added

#16 @leogermani
11 years ago

  • Cc leogermani added

#17 @rodrigosprimo
11 years ago

Attempt to implement Nacin's suggestions. I'm not familiarized with core so I'm sure there is room for improvements. Suggestions are welcome.

I run a site with around 4000 hierarchical posts of a custom type. After applying this patch, memory consumption to load wp-admin/edit.php dropped from 321MB to 63MB.

#18 @nacin
11 years ago

  • Component changed from Performance to Administration
  • Focuses performance added

#19 @rodrigosprimo
11 years ago

  • Keywords has-patch added; needs-patch removed

#20 @nacin
11 years ago

  • Component changed from Administration to Posts, Post Types
  • Focuses administration added

#21 @McBoof
11 years ago

Hi there,

I'm super interested in this thread as I've got a few clients really suffering with 30 second load times with about 5000 hierarchical posts. The patch improves this, but it is still pretty slow. Maybe 10 seconds.

Wouldn't a better algorithm be:

  • Do a normal query (limited by page size) but only return posts with no parent
  • Do another query to find all the children of just these top level posts
  • Merge them together

In the case with a ridiculous hierarchy (say a parent post with 1000s or child posts) this won't help too much. But in the cases when each post only has a couple of children it'll make a massive difference I think.

Admittedly things get messy if you're looking at deeper pages (say page 500 not page 1) of results, but in practice no-one really does this, right?

I guess the real correct way to do this is to store a flattened hierarchy table in the database and keep this up to date on post create/update/delete, and then query on that.

Note both Oracle and SQL Server support hierarchy queries easily. MySql doesn't, but you could fake it at the database level:
http://explainextended.com/2009/03/17/hierarchical-queries-in-mysql/

Just my thoughts. If only I could still code I'd try to do it myself ...
Jon

#22 @marcoamarelo
11 years ago

Hi!

I tried the patch suggested by rodrigosprimo and it works very well. I think it will be good if WordPress adopt this change in its core.

Regards.

Last edited 11 years ago by marcoamarelo (previous) (diff)

#23 @rodrigosprimo
11 years ago

  • Keywords dev-feedback added

Thanks for your comment McBoof but I can't think of an easy way to implement your suggestion for pages other than the first. When building the second page how could we tell which posts have already been used to build the first page? Maybe I'm missing something.

Anyway, at least in my case, the solution proposed by Nacin and implemented in the patch I sent is good enough and is a huge improvement compared with the current algorithm.

Just to give an example, in one of the sites I run with about 4000 hierarchical posts, without this patch the page takes around 25 seconds to load and uses 320MB of memory. After applying the patch the page loads in 9 seconds and uses 68MB of memory.

#24 @SergeyBiryukov
11 years ago

#27325 was marked as a duplicate.

#25 @wonderboymusic
10 years ago

  • Keywords needs-refresh added

Patch no longer applies - looks fun

#26 @rodrigosprimo
10 years ago

  • Keywords needs-refresh removed

Patch updated

This ticket was mentioned in IRC in #wordpress-dev by rodrigoprimo1. View the logs.


10 years ago

#28 @rodrigosprimo
10 years ago

What is next?

#29 @vagamente
10 years ago

No news? I'm plannig to start a big project using hierarchical custom post type and i'd like to know if i'm gonna have problems...

#30 @nofearinc
10 years ago

The last patch version doesn't work for pages after the first one, which breaks the default WordPress functionality. I believe that we could build the tree of IDs, offset based on the page number * posts per page (from the screen options) and query only the posts that are needed. I'll give it some thought, based on your patch which looks elegant.

#31 @nofearinc
10 years ago

Even when applied properly (small fixes), there are some records that are missing. For example:

Items number 4, 5, 6 are children of item 3 (imagine 3 child pages of a parent page). Like the following structure:

Parent1
Parent2
Parent3
-Child1
-Child2
-Child3
Parent4

If you set the "posts per page" to 5, with the original implementation you'll see the following results.

First page from pagination:

Parent1
Parent2
Parent3
-Child1
-Child2

Second page from pagination:

Parent3
-Child3
Parent4

The patch skips the Child3 display (which is tricky itself to get the right number of entries transferred to the second page).

That said, the second page would actually display 6 items instead of 5 due to Parent3 being duplicated in both pages 1 and 2.

The other problem is that there are different types of ordering - by name, date etc., so I'm not sure that the patch would handle all of them gracefully even if we delegate the search queries to SQL and fetch the IDs only as we do now.

#32 @helen
10 years ago

  • Keywords needs-unit-tests added

All of that sounds like a good reason to be writing tests for this. :)

This ticket was mentioned in IRC in #wordpress-dev by nofearinc. View the logs.


10 years ago

#34 @SergeyBiryukov
10 years ago

#29919 was marked as a duplicate.

#35 @rodrigosprimo
10 years ago

@helen I'm not sure how to test this. AFAIK, WP doesn't have functional tests and the methods we are changing output HTML directly so it is not easy to create unit tests for them. The only way I can think of is to use PHPUnit_Framework_TestCase::expectOutputString() or PHPUnit_Framework_TestCase::expectOutputRegex() to check the HTML output. Any other suggestion? I couldn't find tests for WP_List_Table or any of its child classes. Are there other related tests I could use as a reference?

#36 @rodrigosprimo
10 years ago

@nofearinc thanks for testing this patch and reporting the issues you found. Just to make sure we are on the same page, in your example, with the patch applied, on the second page Parent3 is missing and not Child3, right?

#37 @nofearinc
10 years ago

@rodrigosprimo in fact Child3 is also missing - so the second page doesn't list children belonging to parents already listed on the previous page, so building the tree isn't doing it right so far.

@nofearinc
10 years ago

fix the ID fetching if an object reference exists

@rodrigosprimo
10 years ago

Add basic test coverage

#38 @rodrigosprimo
10 years ago

  • Keywords needs-unit-tests removed

I worked today with @nofearinc and we fixed the issues he found. We also added basic test coverage for this ticket.

#39 @rodrigosprimo
10 years ago

Some performance benchmarking on this patch collected using query-monitor plugin and a WordPress install with 6000 posts of a hierarchical custom type:

Without the patch:

Page generation time: 55.3375
Peak memory usage: 466 MB
Database query time: 2.2530
Database queries: SELECT 161

After applying the patch:

Page generation time: 27.6313
Peak memory usage: 87MB
Database query time: 0.4134
Database queries: SELECT 141

This ticket was mentioned in Slack in #core by rodrigoprimo. View the logs.


10 years ago

#41 @johnbillion
10 years ago

  • Keywords needs-unit-tests needs-testing 4.2-early added; dev-feedback removed

I like this, and it's effectively quite a simple change, but I think we need some more unit tests to make sure we don't miss anything. You can never have too many tests.

  • Displaying pages filtered by date or search terms (behaviour should be the same as it is currently)
  • Displaying child pages where the parent page is in the trash
  • Displaying pages with two or more levels of child pages (grandchildren)

We also need to make sure that Quick Edit continues to function as expected after saving a page (with and without changing its parent). Won't be unit-testable, but needs some manual verification.

#42 @johnbillion
10 years ago

  • Summary changed from Need Better Page Hierarchy Display Algo to Need Better Page Hierarchy Display Algorithm

#43 @iseulde
10 years ago

  • Milestone changed from Future Release to 4.2

has-patch 4.2-early so moving to 4.2.

This ticket was mentioned in Slack in #core by drew. View the logs.


10 years ago

#45 @johnbillion
10 years ago

  • Owner set to johnbillion
  • Status changed from new to accepted

I'm going to add some more tests for this and see about getting it in early next week.

This ticket was mentioned in Slack in #core by rodrigoprimo. View the logs.


10 years ago

This ticket was mentioned in Slack in #core by santagada. View the logs.


10 years ago

@johnbillion
10 years ago

#48 @johnbillion
10 years ago

  • Keywords commit added; needs-unit-tests needs-testing 4.2-early removed
  • Type changed from enhancement to task (blessed)

15459.6.diff introduces better and more tests, all of which are passing. I'm very happy with this and it should go into 4.2 beta.

This ticket was mentioned in Slack in #core by drew. View the logs.


10 years ago

#50 @johnbillion
10 years ago

  • Resolution set to fixed
  • Status changed from accepted to closed

In 31730:

Introduce a new algorithm for displaying a hierarchical list of post objects in the WP_Posts_List_Table. This reduces processing time, reduces database queries, and substantially reduces memory use on sites with a high number of Pages.

Props nofearinc, rodrigosprimo, nacin, johnbillion.
Fixes #15459

#51 @SergeyBiryukov
10 years ago

In 31841:

After [31730], replace one more instance of array_shift() with reset() for better performance.

see #31259, #15459.

Note: See TracTickets for help on using tickets.