WordPress.org

Make WordPress Core

Opened 5 years ago

Closed 6 months ago

Last modified 6 months ago

#15459 closed task (blessed) (fixed)

Need Better Page Hierarchy Display Algorithm

Reported by: truthmedia Owned by: johnbillion
Milestone: 4.2 Priority: normal
Severity: normal Version: 3.0.1
Component: Posts, Post Types Keywords: has-patch commit
Focuses: performance, administration Cc:

Description

WordPress 3.0.1

File: /wp-admin/includes/post.php

Lines: 904-912

Code:

	// Hierarchical types require special args.
	if ( is_post_type_hierarchical( $post_type ) ) {
		$query['orderby'] = 'menu_order title';
		$query['order'] = 'asc';
		$query['posts_per_page'] = -1;
		$query['posts_per_archive_page'] = -1;
	}

	wp( $query );

The code in the area mentioned above causes our site to use over 170mb of memory on each load of the Pages index. This really should be improved to accommodate sites that have large amounts of hierarchical data. Looks to me like it's pulling all the data for all the pages on the site. Perhaps a solution where unnecessary data is not included might be better here (such as content or excerpt).

Attachments (6)

15459.diff (2.7 KB) - added by nacin 3 years ago.
15459.2.diff (4.9 KB) - added by rodrigosprimo 21 months ago.
15459.3.diff (4.9 KB) - added by rodrigosprimo 15 months ago.
15459.4.diff (4.8 KB) - added by nofearinc 10 months ago.
fix the ID fetching if an object reference exists
15459.5.diff (7.1 KB) - added by rodrigosprimo 10 months ago.
Add basic test coverage
15459.6.diff (9.4 KB) - added by johnbillion 6 months ago.

Download all attachments as: .zip

Change History (57)

comment:1 @scribu5 years ago

  • Keywords 3.2-early added; memory pages hierarchy removed
  • Milestone changed from Awaiting Review to Future Release
  • Type changed from defect (bug) to enhancement

For taxonomies, we store the tree of IDs and then query based on that. We could do the same for hierarchical post types.

comment:2 @scribu4 years ago

  • Keywords 3.2-early removed

This would require quite a bit of refactoring; no patch in sight.

comment:3 @SergeyBiryukov4 years ago

  • Keywords needs-patch added

comment:4 @duck_4 years ago

Closed #19285 as dupe.

Related change sets [3564], [3727].

comment:6 @nacin3 years ago

Here's a quick patch that breaks all over the place but could be a good starting point for someone wishing to solve this.

Essentially, here's what we currently do:

  • Query all posts
  • Group them based on ID and post_parent
  • Begin rendering posts until we've reached the total to display per page

What we should do instead:

  • Query all IDs and post parents
  • Group them
  • Figure out what posts we need to grab to display the right number on a page
  • Use _prime_post_caches() to pull those specific posts into cache
  • Render those posts

Querying all IDs and parents still isn't ideal, but it's a step in the right direction. Without MPTT this is not easy.

@nacin3 years ago

comment:7 @nacin3 years ago

Here's a quick patch that breaks all over the place but could be a good starting point for someone wishing to solve this.

A note, it's a total duck punch. The idea is a good starting point. The code is not.

comment:8 @JustinSainton3 years ago

Certainly quite dated, but I wonder if anything ever happened here or if anything can be salvaged -

http://gsoc2009wp.wordpress.com/tag/mptt/

comment:9 @batmoo3 years ago

  • Cc batmoo@… added

comment:10 @SergeyBiryukov2 years ago

#15052 was marked as a duplicate.

comment:12 @nickdaugherty2 years ago

  • Cc ndaugherty987@… added

comment:14 @SergeyBiryukov21 months ago

#26326 was marked as a duplicate.

comment:15 @rodrigosprimo21 months ago

  • Cc rodrigosprimo@… added

comment:16 @leogermani21 months ago

  • Cc leogermani added

@rodrigosprimo21 months ago

comment:17 @rodrigosprimo21 months ago

Attempt to implement Nacin's suggestions. I'm not familiarized with core so I'm sure there is room for improvements. Suggestions are welcome.

I run a site with around 4000 hierarchical posts of a custom type. After applying this patch, memory consumption to load wp-admin/edit.php dropped from 321MB to 63MB.

comment:18 @nacin20 months ago

  • Component changed from Performance to Administration
  • Focuses performance added

comment:19 @rodrigosprimo20 months ago

  • Keywords has-patch added; needs-patch removed

comment:20 @nacin19 months ago

  • Component changed from Administration to Posts, Post Types
  • Focuses administration added

comment:21 @McBoof19 months ago

Hi there,

I'm super interested in this thread as I've got a few clients really suffering with 30 second load times with about 5000 hierarchical posts. The patch improves this, but it is still pretty slow. Maybe 10 seconds.

Wouldn't a better algorithm be:

  • Do a normal query (limited by page size) but only return posts with no parent
  • Do another query to find all the children of just these top level posts
  • Merge them together

In the case with a ridiculous hierarchy (say a parent post with 1000s or child posts) this won't help too much. But in the cases when each post only has a couple of children it'll make a massive difference I think.

Admittedly things get messy if you're looking at deeper pages (say page 500 not page 1) of results, but in practice no-one really does this, right?

I guess the real correct way to do this is to store a flattened hierarchy table in the database and keep this up to date on post create/update/delete, and then query on that.

Note both Oracle and SQL Server support hierarchy queries easily. MySql doesn't, but you could fake it at the database level:
http://explainextended.com/2009/03/17/hierarchical-queries-in-mysql/

Just my thoughts. If only I could still code I'd try to do it myself ...
Jon

comment:22 @marcoamarelo19 months ago

Hi!

I tried the patch suggested by rodrigosprimo and it works very well. I think it will be good if WordPress adopt this change in its core.

Regards.

Last edited 19 months ago by marcoamarelo (previous) (diff)

comment:23 @rodrigosprimo19 months ago

  • Keywords dev-feedback added

Thanks for your comment McBoof but I can't think of an easy way to implement your suggestion for pages other than the first. When building the second page how could we tell which posts have already been used to build the first page? Maybe I'm missing something.

Anyway, at least in my case, the solution proposed by Nacin and implemented in the patch I sent is good enough and is a huge improvement compared with the current algorithm.

Just to give an example, in one of the sites I run with about 4000 hierarchical posts, without this patch the page takes around 25 seconds to load and uses 320MB of memory. After applying the patch the page loads in 9 seconds and uses 68MB of memory.

comment:24 @SergeyBiryukov18 months ago

#27325 was marked as a duplicate.

comment:25 @wonderboymusic15 months ago

  • Keywords needs-refresh added

Patch no longer applies - looks fun

@rodrigosprimo15 months ago

comment:26 @rodrigosprimo15 months ago

  • Keywords needs-refresh removed

Patch updated

comment:27 @ircbot14 months ago

This ticket was mentioned in IRC in #wordpress-dev by rodrigoprimo1. View the logs.

comment:28 @rodrigosprimo13 months ago

What is next?

comment:29 @vagamente13 months ago

No news? I'm plannig to start a big project using hierarchical custom post type and i'd like to know if i'm gonna have problems...

comment:30 @nofearinc11 months ago

The last patch version doesn't work for pages after the first one, which breaks the default WordPress functionality. I believe that we could build the tree of IDs, offset based on the page number * posts per page (from the screen options) and query only the posts that are needed. I'll give it some thought, based on your patch which looks elegant.

comment:31 @nofearinc11 months ago

Even when applied properly (small fixes), there are some records that are missing. For example:

Items number 4, 5, 6 are children of item 3 (imagine 3 child pages of a parent page). Like the following structure:

Parent1
Parent2
Parent3
-Child1
-Child2
-Child3
Parent4

If you set the "posts per page" to 5, with the original implementation you'll see the following results.

First page from pagination:

Parent1
Parent2
Parent3
-Child1
-Child2

Second page from pagination:

Parent3
-Child3
Parent4

The patch skips the Child3 display (which is tricky itself to get the right number of entries transferred to the second page).

That said, the second page would actually display 6 items instead of 5 due to Parent3 being duplicated in both pages 1 and 2.

The other problem is that there are different types of ordering - by name, date etc., so I'm not sure that the patch would handle all of them gracefully even if we delegate the search queries to SQL and fetch the IDs only as we do now.

comment:32 @helen11 months ago

  • Keywords needs-unit-tests added

All of that sounds like a good reason to be writing tests for this. :)

comment:33 @ircbot11 months ago

This ticket was mentioned in IRC in #wordpress-dev by nofearinc. View the logs.

comment:34 @SergeyBiryukov11 months ago

#29919 was marked as a duplicate.

comment:35 @rodrigosprimo10 months ago

@helen I'm not sure how to test this. AFAIK, WP doesn't have functional tests and the methods we are changing output HTML directly so it is not easy to create unit tests for them. The only way I can think of is to use PHPUnit_Framework_TestCase::expectOutputString() or PHPUnit_Framework_TestCase::expectOutputRegex() to check the HTML output. Any other suggestion? I couldn't find tests for WP_List_Table or any of its child classes. Are there other related tests I could use as a reference?

comment:36 @rodrigosprimo10 months ago

@nofearinc thanks for testing this patch and reporting the issues you found. Just to make sure we are on the same page, in your example, with the patch applied, on the second page Parent3 is missing and not Child3, right?

comment:37 @nofearinc10 months ago

@rodrigosprimo in fact Child3 is also missing - so the second page doesn't list children belonging to parents already listed on the previous page, so building the tree isn't doing it right so far.

@nofearinc10 months ago

fix the ID fetching if an object reference exists

@rodrigosprimo10 months ago

Add basic test coverage

comment:38 @rodrigosprimo10 months ago

  • Keywords needs-unit-tests removed

I worked today with @nofearinc and we fixed the issues he found. We also added basic test coverage for this ticket.

comment:39 @rodrigosprimo10 months ago

Some performance benchmarking on this patch collected using query-monitor plugin and a WordPress install with 6000 posts of a hierarchical custom type:

Without the patch:

Page generation time: 55.3375
Peak memory usage: 466 MB
Database query time: 2.2530
Database queries: SELECT 161

After applying the patch:

Page generation time: 27.6313
Peak memory usage: 87MB
Database query time: 0.4134
Database queries: SELECT 141

comment:40 @slackbot10 months ago

This ticket was mentioned in Slack in #core by rodrigoprimo. View the logs.

comment:41 @johnbillion10 months ago

  • Keywords needs-unit-tests needs-testing 4.2-early added; dev-feedback removed

I like this, and it's effectively quite a simple change, but I think we need some more unit tests to make sure we don't miss anything. You can never have too many tests.

  • Displaying pages filtered by date or search terms (behaviour should be the same as it is currently)
  • Displaying child pages where the parent page is in the trash
  • Displaying pages with two or more levels of child pages (grandchildren)

We also need to make sure that Quick Edit continues to function as expected after saving a page (with and without changing its parent). Won't be unit-testable, but needs some manual verification.

comment:42 @johnbillion9 months ago

  • Summary changed from Need Better Page Hierarchy Display Algo to Need Better Page Hierarchy Display Algorithm

comment:43 @iseulde8 months ago

  • Milestone changed from Future Release to 4.2

has-patch 4.2-early so moving to 4.2.

comment:44 @slackbot7 months ago

This ticket was mentioned in Slack in #core by drew. View the logs.

comment:45 @johnbillion7 months ago

  • Owner set to johnbillion
  • Status changed from new to accepted

I'm going to add some more tests for this and see about getting it in early next week.

comment:46 @slackbot6 months ago

This ticket was mentioned in Slack in #core by rodrigoprimo. View the logs.

comment:47 @slackbot6 months ago

This ticket was mentioned in Slack in #core by santagada. View the logs.

@johnbillion6 months ago

comment:48 @johnbillion6 months ago

  • Keywords commit added; needs-unit-tests needs-testing 4.2-early removed
  • Type changed from enhancement to task (blessed)

15459.6.diff introduces better and more tests, all of which are passing. I'm very happy with this and it should go into 4.2 beta.

comment:49 @slackbot6 months ago

This ticket was mentioned in Slack in #core by drew. View the logs.

comment:50 @johnbillion6 months ago

  • Resolution set to fixed
  • Status changed from accepted to closed

In 31730:

Introduce a new algorithm for displaying a hierarchical list of post objects in the WP_Posts_List_Table. This reduces processing time, reduces database queries, and substantially reduces memory use on sites with a high number of Pages.

Props nofearinc, rodrigosprimo, nacin, johnbillion.
Fixes #15459

comment:51 @SergeyBiryukov6 months ago

In 31841:

After [31730], replace one more instance of array_shift() with reset() for better performance.

see #31259, #15459.

Note: See TracTickets for help on using tickets.