Opened 4 years ago

Last modified 3 months ago

#10852 reviewing enhancement

improve get_page_children

Reported by: hailin Owned by: westi
Priority: normal Milestone: Future Release
Component: Optimization Version: 2.9
Severity: normal Keywords: has-patch dev-feedback
Cc: hailin, miqrogroove@…, Denis-de-Bernardy, mike@…

Description

In #5303, mihai pointed out that get_page_children is very slow with 7000 pages. It's indeed slow, since the algorithm has O2 complexity.

We should improve it to O(N) complexity.

Attachments (1)

10852_get_page_children.diff (2.0 KB) - added by hailin 4 years ago.
patch

Download all attachments as: .zip

Change History (25)

hailin4 years ago

patch

  • Cc hailin added
  • Component changed from General to Optimization
  • Milestone changed from Unassigned to 2.8.5
  • Type changed from defect (bug) to enhancement

It's a clean implementation that achieves O(N) complexity for this particular function.

It does not attempt to address update_post_caches though,
as it's a separate issue.

comment:2 follow-up: ↓ 4   hailin4 years ago

unit tested and verified that it uses only 20% of the time used by the previous algorithm, even for a small data set.

  • Keywords has-patch added
  • Milestone changed from 2.8.5 to 2.9
  • Version set to 2.9

comment:4 in reply to: ↑ 2   westi4 years ago

  • Owner set to westi
  • Status changed from new to reviewing

Replying to hailin:

unit tested and verified that it uses only 20% of the time used by the previous algorithm, even for a small data set.

Cool.

Would you be able to share the test cases or even better write a patch for the WordPress unit tests repo to include this functionality and performance testing

comment:5 follow-up: ↓ 8   hailin4 years ago

westi:

The test case was realy simple - I just constructed a nested page tree with about 100 nodes. And put

$t1 = microtime(true);
get_page_children( $id ); adjust your id here
$t2 = microtime(true);

$t = $t2 - $t1;
echo $t

You can see significant performance gain even with ~ 20 nested nodes.

I heard two recent GoSOC students did MPTT, and core devs are thinking about using MPTT to rearrange all these nested data structures. If it happens, all these functions and algorithms will be replaced eventually. I can still try to find some time to write unit test repo, if I can free up my hands in the near future.

hailin: your code looks better then mine and it should be faster too without all those counts I did in mine.

But I'm a little disappointed that this is not going to be included in 2.8.5 ( unless there will be no 2.8.5 and 2.9 will be the next release :) )

Should I open a new ticket for update_page_cache ?

comment:7 follow-up: ↓ 10   hailin4 years ago

mihai:

I'm not sure calling update_page_cache in those page functions is a bug - it part of the cache updating mechanism.

However, both Ryan and I have identified this as the biggest time consumption issue
when there are thousands of pages.

Can you test my patch with your 7000 pages and report back the results?
I think Westi hold it off because he wants to see more testing results.

You could name my implementation get_page_children2(), then use the above $t = microtime(true) code snippet to make a direct call of old get_page_children() and new get_page_children2(), then compare the time spent in each case.

You could test these in index.php directly, and pass in the ID of the root page.

comment:8 in reply to: ↑ 5   westi4 years ago

Replying to hailin:

westi:

The test case was realy simple - I just constructed a nested page tree with about 100 nodes. And put

$t1 = microtime(true);
get_page_children( $id ); adjust your id here
$t2 = microtime(true);

$t = $t2 - $t1;
echo $t

You can see significant performance gain even with ~ 20 nested nodes.

Ok. Did you write any functionality tests or only test performance?
If would be really good when changing things like this for performance to write the functionality unit tests too to make sure we don't accidentally introduce a regression.

If you refer to the standard wordpress-tests test cases, I didn't write or add any new test cases.

What I did test was to put the code snippet in index.php, name the new function as get_page_children2, and run against an existing page hierarchy with ~100 nodes.
I verified results ($x) are the same before and with this patch, and the new algorithm is significantly faster ($t)

$t1 = microtime(true);
$x = get_page_children( $id ); adjust your id here
var_dump( $x );
$t2 = microtime(true);
$t = $t2 - $t1;
echo "it takes $t for the old get_page_children";

$t1 = microtime(true);
$x = get_page_children2( $id ); adjust your id here
var_dump( $x );
$t2 = microtime(true);
$t = $t2 - $t1;
echo "it takes $t for the new get_page_children";

I took a look at the standard wordpress-tests cases, but couldn't not find any existing cases that covers this particular function. If we were to write some cases,
I think it'd better to write a comprehensive case to cover most of pages functionalities.

comment:10 in reply to: ↑ 7   mihai4 years ago

Replying to hailin:

mihai:

I'm not sure calling update_page_cache in those page functions is a bug - it part of the cache updating mechanism.

With my test updating the cache took about 33% of the time. If the page loaded in 3 minutes with no improvement, it loaded in 1 minute with the improved get_page_children and in just a few seconds if I removed the update_page_cache call.
It might not be a bug it it adds significant time to the loading. Still I can't see why I would want to update the page cache when I view the page list? Shouldn't this cache be updated only when a page is changed ?

However, both Ryan and I have identified this as the biggest time consumption issue
when there are thousands of pages.

How many pages did you have ? how much time did it take to load? Did you actually test by loading the page admin ?

comment:11 follow-up: ↓ 12   hailin4 years ago

So my patch improves the load time from 3 min to 1 min. That is the purpose of my patch - to speed things up algorithmically.

update_page_cache is indeed another bigger issue, which I will leave it to lead core WP devs.

The tests I did was with 5000 pages, about a year ago, on a live blog.
removing update_post_caches($this->posts) saves about 17 sec out of 33 sec total time.

comment:12 in reply to: ↑ 11   mihai4 years ago

Replying to hailin:

So my patch improves the load time from 3 min to 1 min. That is the purpose of my patch - to speed things up algorithmically.

Actually my patch did that, I didn't try yours yet, but it seems somehow similar so yeah I guess it could do that too.

update_page_cache is indeed another bigger issue, which I will leave it to lead core WP devs.

The tests I did was with 5000 pages, about a year ago, on a live blog.
removing update_post_caches($this->posts) saves about 17 sec out of 33 sec total time.

On my 7000 pages test it saved more then 40 seconds. it may not seem like a lot but it's not the same to wait 20 seconds or one minute to load the page list.

There is another solution for this problem which has O(1) complexity, but would require an extra field in the posts table. Another downside is that it would be much slower when posting a new page or updating a page. O(n) is really the best solution so let's go for it.

  • Cc miqrogroove@… added

If you want this to be optimized for large dependency trees, you should use the function pattern: assign, test, recurse. You can unset() the top node in get_page_children(). As you have it - test, assign, recurse - there is an extra recursion call for every leaf page.

  • Milestone changed from 2.9 to 3.0

Related: #11633

  • Cc Denis-de-Bernardy added

Related: #8384, #10277

  • Keywords tested added

it's still applying clean, and it's not breaking anything. marking it as tested considering the above-mentioned unit tests.

could we get this in WP 3.0 before it's too late?

  • Keywords early added; tested removed
  • Milestone changed from 3.0 to 3.1

Too late for 3.0. Moving to 3.1 early.

Also:

there is an extra recursion call for every leaf page

  • Milestone changed from Awaiting Triage to Future Release

Would like to see this improvement in core, but there's a few other things to consider first, as outlined in my last comment.

  • Keywords dev-feedback added; early removed
  • Cc mike@… added
Note: See TracTickets for help on using tickets.