#10852 closed enhancement (fixed)
improve get_page_children
Reported by: | hailin | Owned by: | boonebgorges |
---|---|---|---|
Milestone: | 4.3 | Priority: | normal |
Severity: | normal | Version: | 2.9 |
Component: | Posts, Post Types | Keywords: | has-patch |
Focuses: | performance | Cc: |
Description
In #5303, mihai pointed out that get_page_children is very slow with 7000 pages. It's indeed slow, since the algorithm has O2 complexity.
We should improve it to O(N) complexity.
Attachments (6)
Change History (53)
#1
@
15 years ago
- Cc hailin added
- Component changed from General to Optimization
- Milestone changed from Unassigned to 2.8.5
- Type changed from defect (bug) to enhancement
It's a clean implementation that achieves O(N) complexity for this particular function.
It does not attempt to address update_post_caches though,
as it's a separate issue.
#2
follow-up:
↓ 4
@
15 years ago
unit tested and verified that it uses only 20% of the time used by the previous algorithm, even for a small data set.
#4
in reply to:
↑ 2
@
15 years ago
- Owner set to westi
- Status changed from new to reviewing
Replying to hailin:
unit tested and verified that it uses only 20% of the time used by the previous algorithm, even for a small data set.
Cool.
Would you be able to share the test cases or even better write a patch for the WordPress unit tests repo to include this functionality and performance testing
#5
follow-up:
↓ 8
@
15 years ago
westi:
The test case was realy simple - I just constructed a nested page tree with about 100 nodes. And put
$t1 = microtime(true);
get_page_children( $id ); adjust your id here
$t2 = microtime(true);
$t = $t2 - $t1;
echo $t
You can see significant performance gain even with ~ 20 nested nodes.
I heard two recent GoSOC students did MPTT, and core devs are thinking about using MPTT to rearrange all these nested data structures. If it happens, all these functions and algorithms will be replaced eventually. I can still try to find some time to write unit test repo, if I can free up my hands in the near future.
#6
@
15 years ago
hailin: your code looks better then mine and it should be faster too without all those counts I did in mine.
But I'm a little disappointed that this is not going to be included in 2.8.5 ( unless there will be no 2.8.5 and 2.9 will be the next release :) )
Should I open a new ticket for update_page_cache ?
#7
follow-up:
↓ 10
@
15 years ago
mihai:
I'm not sure calling update_page_cache in those page functions is a bug - it part of the cache updating mechanism.
However, both Ryan and I have identified this as the biggest time consumption issue
when there are thousands of pages.
Can you test my patch with your 7000 pages and report back the results?
I think Westi hold it off because he wants to see more testing results.
You could name my implementation get_page_children2(), then use the above $t = microtime(true) code snippet to make a direct call of old get_page_children() and new get_page_children2(), then compare the time spent in each case.
You could test these in index.php directly, and pass in the ID of the root page.
#8
in reply to:
↑ 5
@
15 years ago
Replying to hailin:
westi:
The test case was realy simple - I just constructed a nested page tree with about 100 nodes. And put
$t1 = microtime(true);
get_page_children( $id ); adjust your id here
$t2 = microtime(true);
$t = $t2 - $t1;
echo $t
You can see significant performance gain even with ~ 20 nested nodes.
Ok. Did you write any functionality tests or only test performance?
If would be really good when changing things like this for performance to write the functionality unit tests too to make sure we don't accidentally introduce a regression.
#9
@
15 years ago
If you refer to the standard wordpress-tests test cases, I didn't write or add any new test cases.
What I did test was to put the code snippet in index.php, name the new function as get_page_children2, and run against an existing page hierarchy with ~100 nodes.
I verified results ($x) are the same before and with this patch, and the new algorithm is significantly faster ($t)
$t1 = microtime(true);
$x = get_page_children( $id ); adjust your id here
var_dump( $x );
$t2 = microtime(true);
$t = $t2 - $t1;
echo "it takes $t for the old get_page_children";
$t1 = microtime(true);
$x = get_page_children2( $id ); adjust your id here
var_dump( $x );
$t2 = microtime(true);
$t = $t2 - $t1;
echo "it takes $t for the new get_page_children";
I took a look at the standard wordpress-tests cases, but couldn't not find any existing cases that covers this particular function. If we were to write some cases,
I think it'd better to write a comprehensive case to cover most of pages functionalities.
#10
in reply to:
↑ 7
@
15 years ago
Replying to hailin:
mihai:
I'm not sure calling update_page_cache in those page functions is a bug - it part of the cache updating mechanism.
With my test updating the cache took about 33% of the time. If the page loaded in 3 minutes with no improvement, it loaded in 1 minute with the improved get_page_children and in just a few seconds if I removed the update_page_cache call.
It might not be a bug it it adds significant time to the loading. Still I can't see why I would want to update the page cache when I view the page list? Shouldn't this cache be updated only when a page is changed ?
However, both Ryan and I have identified this as the biggest time consumption issue
when there are thousands of pages.
How many pages did you have ? how much time did it take to load? Did you actually test by loading the page admin ?
#11
follow-up:
↓ 12
@
15 years ago
So my patch improves the load time from 3 min to 1 min. That is the purpose of my patch - to speed things up algorithmically.
update_page_cache is indeed another bigger issue, which I will leave it to lead core WP devs.
The tests I did was with 5000 pages, about a year ago, on a live blog.
removing update_post_caches($this->posts) saves about 17 sec out of 33 sec total time.
#12
in reply to:
↑ 11
@
15 years ago
Replying to hailin:
So my patch improves the load time from 3 min to 1 min. That is the purpose of my patch - to speed things up algorithmically.
Actually my patch did that, I didn't try yours yet, but it seems somehow similar so yeah I guess it could do that too.
update_page_cache is indeed another bigger issue, which I will leave it to lead core WP devs.
The tests I did was with 5000 pages, about a year ago, on a live blog.
removing update_post_caches($this->posts) saves about 17 sec out of 33 sec total time.
On my 7000 pages test it saved more then 40 seconds. it may not seem like a lot but it's not the same to wait 20 seconds or one minute to load the page list.
#13
@
15 years ago
There is another solution for this problem which has O(1) complexity, but would require an extra field in the posts table. Another downside is that it would be much slower when posting a new page or updating a page. O(n) is really the best solution so let's go for it.
#14
@
15 years ago
- Cc miqrogroove@… added
If you want this to be optimized for large dependency trees, you should use the function pattern: assign, test, recurse. You can unset() the top node in get_page_children(). As you have it - test, assign, recurse - there is an extra recursion call for every leaf page.
#19
@
15 years ago
- Keywords tested added
it's still applying clean, and it's not breaking anything. marking it as tested considering the above-mentioned unit tests.
could we get this in WP 3.0 before it's too late?
#20
@
15 years ago
- Keywords early added; tested removed
- Milestone changed from 3.0 to 3.1
Too late for 3.0. Moving to 3.1 early.
Also:
there is an extra recursion call for every leaf page
#22
@
14 years ago
Would like to see this improvement in core, but there's a few other things to consider first, as outlined in my last comment.
#26
@
10 years ago
Applying the patch reduced the page load from 9.5min to 15sec here (for a specific use case of 13K post type entries). Haven't noticed any specific regressions or missing items or connections.
#29
@
10 years ago
Sorry that was supposed to be the benchmark for the new algorithm, if someone can fix that data it would be awesome.
My code doesn't have any recursion (so the non documented maximum depth of 100 is gone) and it is much much faster.
In one admin page of a client with around 8k posts it takes up to 2 minutes to render using the old code and 9 seconds with the patch. This function is called many times in admin to build a hierarchical list on screen.
#30
@
10 years ago
Hope to end this 5 year history... if anything more is needed to make this go live I would gladly do it. Right now a part of the admin is dreaded by users.
#31
@
10 years ago
- Keywords needs-unit-tests added
Core unit tests that pass before and after would be needed here as well.
#32
follow-up:
↓ 33
@
10 years ago
Is there a timeframe to get it into 4.2 (or 4.3)? I can come up with a test suite, just tell me how long I have to create it :)
#33
in reply to:
↑ 32
@
10 years ago
Replying to santagada:
Is there a timeframe to get it into 4.2 (or 4.3)? I can come up with a test suite, just tell me how long I have to create it :)
For inclusion in 4.2, your patch would have to be approved and committed by March 10th, which is just about 2.5 weeks from now. I would not suggest waiting until the last minute :-)
#34
@
10 years ago
Looking at code from get_page_hierarchy I tightened the code a bit (get_page_hierarchy could be updated to not be recursive which is very slow in the common php vm) and updated the description of the function to point out it is also O(n) now.
Any ideas how should I test this code? I'm thinking of adding tests to tests/post/getPageChildren.php, does that seem ok?
#35
@
10 years ago
santagada - Thanks for the patch. This looks like it'll be a great improvement.
Any ideas how should I test this code? I'm thinking of adding tests to tests/post/getPageChildren.php, does that seem ok?
I was coming to this ticket to suggest that very thing :) I'd suggest writing tests against the existing implementation, and posting them here as a separate patch, so that we can do side-by-side regression tests.
#36
@
10 years ago
Added tests, anything else needed to merge in core?
Sorry to wait for the last minute :|
This ticket was mentioned in Slack in #core by santagada. View the logs.
10 years ago
#38
@
10 years ago
- Keywords 4.3-early added; needs-unit-tests removed
- Owner changed from westi to boonebgorges
santagada - This looks like it will be a great improvement. Let's shoot for first thing after 4.2.
#39
@
9 years ago
Has anyone read the unit test? You should keep the old function in there and test the two output's, I'm making a new unit test right now... 6 Years! Wow
This ticket was mentioned in Slack in #core by lewiscowles. View the logs.
9 years ago
#41
@
9 years ago
More testing is always a good thing, but I think the tests that I made are reasonable and there other tests that touch this function also.
Let's just hope this will be merged on 4.3.
#43
@
9 years ago
- Keywords dev-feedback removed
I've just done a review and some tests of santagada's most recent approach, and can confirm that it dramatically increases performance (by 3-4 orders of magnitude, in various tests). I'm going to make the tests a bit leaner and more specific and then move forward with the improvement.
patch