Make WordPress Core

Opened 15 years ago

Closed 9 years ago

Last modified 9 years ago

#10852 closed enhancement (fixed)

improve get_page_children

Reported by: hailin's profile hailin Owned by: boonebgorges's profile boonebgorges
Milestone: 4.3 Priority: normal
Severity: normal Version: 2.9
Component: Posts, Post Types Keywords: has-patch
Focuses: performance Cc:

Description

In #5303, mihai pointed out that get_page_children is very slow with 7000 pages. It's indeed slow, since the algorithm has O2 complexity.

We should improve it to O(N) complexity.

Attachments (6)

10852_get_page_children.diff (2.0 KB) - added by hailin 15 years ago.
patch
new_get_page_children.php (2.3 KB) - added by santagada 10 years ago.
another version of the patch, not using recursion
new_get_page_children.diff (1.4 KB) - added by santagada 10 years ago.
actual code, no recursion and a little faster
new_get_page_children.2.diff (1.6 KB) - added by santagada 10 years ago.
new version of the patch
new_get_page_children.2.php (2.0 KB) - added by santagada 10 years ago.
updated code in the benchmarks too
new_get_page_children.3.diff (4.6 KB) - added by santagada 10 years ago.
new version with tests

Download all attachments as: .zip

Change History (53)

#1 @hailin
15 years ago

  • Cc hailin added
  • Component changed from General to Optimization
  • Milestone changed from Unassigned to 2.8.5
  • Type changed from defect (bug) to enhancement

It's a clean implementation that achieves O(N) complexity for this particular function.

It does not attempt to address update_post_caches though,
as it's a separate issue.

#2 follow-up: @hailin
15 years ago

unit tested and verified that it uses only 20% of the time used by the previous algorithm, even for a small data set.

#3 @scribu
15 years ago

  • Keywords has-patch added
  • Milestone changed from 2.8.5 to 2.9
  • Version set to 2.9

#4 in reply to: ↑ 2 @westi
15 years ago

  • Owner set to westi
  • Status changed from new to reviewing

Replying to hailin:

unit tested and verified that it uses only 20% of the time used by the previous algorithm, even for a small data set.

Cool.

Would you be able to share the test cases or even better write a patch for the WordPress unit tests repo to include this functionality and performance testing

#5 follow-up: @hailin
15 years ago

westi:

The test case was realy simple - I just constructed a nested page tree with about 100 nodes. And put

$t1 = microtime(true);
get_page_children( $id ); adjust your id here
$t2 = microtime(true);

$t = $t2 - $t1;
echo $t

You can see significant performance gain even with ~ 20 nested nodes.

I heard two recent GoSOC students did MPTT, and core devs are thinking about using MPTT to rearrange all these nested data structures. If it happens, all these functions and algorithms will be replaced eventually. I can still try to find some time to write unit test repo, if I can free up my hands in the near future.

#6 @mihai
15 years ago

hailin: your code looks better then mine and it should be faster too without all those counts I did in mine.

But I'm a little disappointed that this is not going to be included in 2.8.5 ( unless there will be no 2.8.5 and 2.9 will be the next release :) )

Should I open a new ticket for update_page_cache ?

#7 follow-up: @hailin
15 years ago

mihai:

I'm not sure calling update_page_cache in those page functions is a bug - it part of the cache updating mechanism.

However, both Ryan and I have identified this as the biggest time consumption issue
when there are thousands of pages.

Can you test my patch with your 7000 pages and report back the results?
I think Westi hold it off because he wants to see more testing results.

You could name my implementation get_page_children2(), then use the above $t = microtime(true) code snippet to make a direct call of old get_page_children() and new get_page_children2(), then compare the time spent in each case.

You could test these in index.php directly, and pass in the ID of the root page.

#8 in reply to: ↑ 5 @westi
15 years ago

Replying to hailin:

westi:

The test case was realy simple - I just constructed a nested page tree with about 100 nodes. And put

$t1 = microtime(true);
get_page_children( $id ); adjust your id here
$t2 = microtime(true);

$t = $t2 - $t1;
echo $t

You can see significant performance gain even with ~ 20 nested nodes.

Ok. Did you write any functionality tests or only test performance?
If would be really good when changing things like this for performance to write the functionality unit tests too to make sure we don't accidentally introduce a regression.

#9 @hailin
15 years ago

If you refer to the standard wordpress-tests test cases, I didn't write or add any new test cases.

What I did test was to put the code snippet in index.php, name the new function as get_page_children2, and run against an existing page hierarchy with ~100 nodes.
I verified results ($x) are the same before and with this patch, and the new algorithm is significantly faster ($t)

$t1 = microtime(true);
$x = get_page_children( $id ); adjust your id here
var_dump( $x );
$t2 = microtime(true);
$t = $t2 - $t1;
echo "it takes $t for the old get_page_children";

$t1 = microtime(true);
$x = get_page_children2( $id ); adjust your id here
var_dump( $x );
$t2 = microtime(true);
$t = $t2 - $t1;
echo "it takes $t for the new get_page_children";

I took a look at the standard wordpress-tests cases, but couldn't not find any existing cases that covers this particular function. If we were to write some cases,
I think it'd better to write a comprehensive case to cover most of pages functionalities.

#10 in reply to: ↑ 7 @mihai
15 years ago

Replying to hailin:

mihai:

I'm not sure calling update_page_cache in those page functions is a bug - it part of the cache updating mechanism.

With my test updating the cache took about 33% of the time. If the page loaded in 3 minutes with no improvement, it loaded in 1 minute with the improved get_page_children and in just a few seconds if I removed the update_page_cache call.
It might not be a bug it it adds significant time to the loading. Still I can't see why I would want to update the page cache when I view the page list? Shouldn't this cache be updated only when a page is changed ?

However, both Ryan and I have identified this as the biggest time consumption issue
when there are thousands of pages.

How many pages did you have ? how much time did it take to load? Did you actually test by loading the page admin ?

#11 follow-up: @hailin
15 years ago

So my patch improves the load time from 3 min to 1 min. That is the purpose of my patch - to speed things up algorithmically.

update_page_cache is indeed another bigger issue, which I will leave it to lead core WP devs.

The tests I did was with 5000 pages, about a year ago, on a live blog.
removing update_post_caches($this->posts) saves about 17 sec out of 33 sec total time.

#12 in reply to: ↑ 11 @mihai
15 years ago

Replying to hailin:

So my patch improves the load time from 3 min to 1 min. That is the purpose of my patch - to speed things up algorithmically.

Actually my patch did that, I didn't try yours yet, but it seems somehow similar so yeah I guess it could do that too.

update_page_cache is indeed another bigger issue, which I will leave it to lead core WP devs.

The tests I did was with 5000 pages, about a year ago, on a live blog.
removing update_post_caches($this->posts) saves about 17 sec out of 33 sec total time.

On my 7000 pages test it saved more then 40 seconds. it may not seem like a lot but it's not the same to wait 20 seconds or one minute to load the page list.

#13 @FolioVision
15 years ago

There is another solution for this problem which has O(1) complexity, but would require an extra field in the posts table. Another downside is that it would be much slower when posting a new page or updating a page. O(n) is really the best solution so let's go for it.

#14 @miqrogroove
15 years ago

  • Cc miqrogroove@… added

If you want this to be optimized for large dependency trees, you should use the function pattern: assign, test, recurse. You can unset() the top node in get_page_children(). As you have it - test, assign, recurse - there is an extra recursion call for every leaf page.

#15 @ryan
15 years ago

  • Milestone changed from 2.9 to 3.0

#16 @hakre
15 years ago

Related: #11633

#17 @Denis-de-Bernardy
15 years ago

  • Cc Denis-de-Bernardy added

#19 @Denis-de-Bernardy
15 years ago

  • Keywords tested added

it's still applying clean, and it's not breaking anything. marking it as tested considering the above-mentioned unit tests.

could we get this in WP 3.0 before it's too late?

#20 @nacin
15 years ago

  • Keywords early added; tested removed
  • Milestone changed from 3.0 to 3.1

Too late for 3.0. Moving to 3.1 early.

Also:

there is an extra recursion call for every leaf page

#21 @nacin
14 years ago

  • Milestone changed from Awaiting Triage to Future Release

#22 @nacin
14 years ago

Would like to see this improvement in core, but there's a few other things to consider first, as outlined in my last comment.

#23 @edwardw
13 years ago

  • Keywords dev-feedback added; early removed

#24 @mbijon
12 years ago

  • Cc mike@… added

#25 @nacin
11 years ago

  • Component changed from Optimization to Post Types
  • Focuses performance added

#26 @nofearinc
10 years ago

Applying the patch reduced the page load from 9.5min to 15sec here (for a specific use case of 13K post type entries). Haven't noticed any specific regressions or missing items or connections.

#27 @v-media
10 years ago

Will this be ever released?

#28 @helen
10 years ago

#31411 was marked as a duplicate.

@santagada
10 years ago

another version of the patch, not using recursion

#29 @santagada
10 years ago

Sorry that was supposed to be the benchmark for the new algorithm, if someone can fix that data it would be awesome.

My code doesn't have any recursion (so the non documented maximum depth of 100 is gone) and it is much much faster.

In one admin page of a client with around 8k posts it takes up to 2 minutes to render using the old code and 9 seconds with the patch. This function is called many times in admin to build a hierarchical list on screen.

@santagada
10 years ago

actual code, no recursion and a little faster

#30 @santagada
10 years ago

Hope to end this 5 year history... if anything more is needed to make this go live I would gladly do it. Right now a part of the admin is dreaded by users.

#31 @DrewAPicture
10 years ago

  • Keywords needs-unit-tests added

Core unit tests that pass before and after would be needed here as well.

#32 follow-up: @santagada
10 years ago

Is there a timeframe to get it into 4.2 (or 4.3)? I can come up with a test suite, just tell me how long I have to create it :)

#33 in reply to: ↑ 32 @DrewAPicture
10 years ago

Replying to santagada:

Is there a timeframe to get it into 4.2 (or 4.3)? I can come up with a test suite, just tell me how long I have to create it :)

For inclusion in 4.2, your patch would have to be approved and committed by March 10th, which is just about 2.5 weeks from now. I would not suggest waiting until the last minute :-)

@santagada
10 years ago

new version of the patch

@santagada
10 years ago

updated code in the benchmarks too

#34 @santagada
10 years ago

Looking at code from get_page_hierarchy I tightened the code a bit (get_page_hierarchy could be updated to not be recursive which is very slow in the common php vm) and updated the description of the function to point out it is also O(n) now.

Any ideas how should I test this code? I'm thinking of adding tests to tests/post/getPageChildren.php, does that seem ok?

#35 @boonebgorges
10 years ago

santagada - Thanks for the patch. This looks like it'll be a great improvement.

Any ideas how should I test this code? I'm thinking of adding tests to tests/post/getPageChildren.php, does that seem ok?

I was coming to this ticket to suggest that very thing :) I'd suggest writing tests against the existing implementation, and posting them here as a separate patch, so that we can do side-by-side regression tests.

@santagada
10 years ago

new version with tests

#36 @santagada
10 years ago

Added tests, anything else needed to merge in core?

Sorry to wait for the last minute :|

This ticket was mentioned in Slack in #core by santagada. View the logs.


10 years ago

#38 @boonebgorges
10 years ago

  • Keywords 4.3-early added; needs-unit-tests removed
  • Owner changed from westi to boonebgorges

santagada - This looks like it will be a great improvement. Let's shoot for first thing after 4.2.

#39 @LewisCowles
9 years ago

Has anyone read the unit test? You should keep the old function in there and test the two output's, I'm making a new unit test right now... 6 Years! Wow

This ticket was mentioned in Slack in #core by lewiscowles. View the logs.


9 years ago

#41 @santagada
9 years ago

More testing is always a good thing, but I think the tests that I made are reasonable and there other tests that touch this function also.

Let's just hope this will be merged on 4.3.

#42 @obenland
9 years ago

  • Keywords 4.3-early removed
  • Milestone changed from Future Release to 4.3

#43 @boonebgorges
9 years ago

  • Keywords dev-feedback removed

I've just done a review and some tests of santagada's most recent approach, and can confirm that it dramatically increases performance (by 3-4 orders of magnitude, in various tests). I'm going to make the tests a bit leaner and more specific and then move forward with the improvement.

#44 @boonebgorges
9 years ago

In 32354:

Unit tests for get_page_children().

Props santagada, boonebgorges.
See #10852.

#45 @boonebgorges
9 years ago

  • Resolution set to fixed
  • Status changed from reviewing to closed

In 32355:

Improve performance of get_page_children().

The new algorithm uses a hash table rather than function recursion, reducing
complexity to O(N). On large numbers of pages, the performance improvement is
several orders of magnitude.

Props santagada, hailin, mihai.
Fixes #10852.

#46 follow-up: @santagada
9 years ago

Thank you for the unittest cleanup and for moving this forward :)

#47 in reply to: ↑ 46 @boonebgorges
9 years ago

Replying to santagada:

Thank you for the unittest cleanup and for moving this forward :)

Thanks for your contribution and your persistence! This is a really great improvement.

Note: See TracTickets for help on using tickets.