#16687 closed task (blessed) (fixed)
%postname% permalinks should not trigger verbose rewrite rules
Reported by: | nacin | Owned by: | |
---|---|---|---|
Milestone: | 3.3 | Priority: | normal |
Severity: | normal | Version: | |
Component: | Rewrite Rules | Keywords: | needs-testing |
Focuses: | Cc: |
Description
The %postname% permalink structure is very common. As it currently stands, this introduces a performance penalty with large numbers of posts.
I think we can avoid verbose rules here by mapping (.+?)
not to pagename=
, but to a new, temporary query variable that would trigger a query by post_name for both pages and posts.
We can recommend against it all we want, but this structure isn't going away on the internet. Fighting it only causes perception issues relating to scaling. We should instead embrace it and make at least this structure scalable.
Attachments (9)
Change History (72)
#2
@
14 years ago
I like both.
I'm not sure we can apply the same line of thought I'm thinking here, to /%category%/%postname%/. That'll require a lot more work to get right and match our current handling.
#5
@
14 years ago
For /%postname%/ can we just not do if ( no slashes in the page query string ) then set type = post||page
else set type = page
(as a slash in the pagename, means' a hierarchical type, so not a post..)
That'll probably require a extra check for is_page/etc conditionals to set it to the correct one..
#6
@
14 years ago
That could work, but I'm not sure it's worth it. You might stomp on other rewrite rules.
#8
@
14 years ago
- Cc mikeschinkel@… added
Suggestion: Rather than focus on this one use-case how about instead add a hook that allows this and other use-cases to be addressed by both core and by plugins, and then address this use-case in core using the new hook.
I've added ticket #16692 with a patch that can get the discussion started.
#9
@
14 years ago
- Milestone changed from Future Release to 3.2
Mark floated around a plugin POC that implemented this. It could be turned into a patch pretty easily I imagine. This is a really good enhancement in line with 3.2, moving there.
#10
@
14 years ago
That's the plugin I worked on briefly. I don't recall the specifics, but it's not complete. Something wasn't working. I throw it up here with that caveat just to get some ideas flowing.
#11
@
14 years ago
A page and a post can have the same slug. If /%postname%/ permalink is used, the post is not reachable. Maybe wp_unique_post_slug()
can be modified but that might break existing links.
#12
follow-up:
↓ 13
@
13 years ago
I am overstepping my knowledge here, so ignore me if the following does not make sense:
I just tested the number of rewrite rules with AskApache rewrite viewer plugin. Indeed a change from an 'allowed' permalink to a %postname or %category permalink increases both WP rewrites as well as page rewrites with 11 rules each (per page!).
But every additional group of 11 rules is exactly the same (they have no reference to page/post ID's etc) according to AskApache plugin, so is there not a way to just not generate all those duplicate rules?
Anyway, I am glad you are working on this as it does prevent acceptance for CMS users. Cheers.
#13
in reply to:
↑ 12
@
13 years ago
Replying to bike:
But every additional group of 11 rules is exactly the same (they have no reference to page/post ID's etc) according to AskApache plugin, so is there not a way to just not generate all those duplicate rules?
The rules are not the same, the AskApache plugin is only showing you the end result, not what it's matching against. It's showing you the values, not the keys.
Edit the AskApache plugin on line 347 to change this:
foreach($v as $vv) {
To this:
foreach($v as $vv=>$res) {
And then take a look. The rule keys will be shown instead, and you'll see why the rules are the way they are. The actual end rules may be the same, but the regular expressions they're matching against are not, and those are the more important part.
#16
@
13 years ago
Attaching a new PoC plugin (non-verbose-page-rules.002.php) to fix the problems Mark referred to. The problem was with sub-pages being blocked by the post attachment rule:
[^/]+/([^/]+)/?$ => index.php?attachment=$matches[1] // ... and similar for trackbacks, feeds, comment pages
I worked around this by removing these attachment rules and dealing with them as part of the page catch-all. If there is a / then it's either a page or an attachment, it's the latter if matched by any of the attachment rewrite rules that were previously unset (transformed into non-regex form, this is where it gets a bit uglier/hackish[1]) *and* the attachment piece -- ([^/]+)
-- is actually an attachment... otherwise it's a page. On the other hand we have a post or a page which can be dealt with as before.
I also added a check to only change the rules specifically for the /%postname%/ structure as it will just break other verbose structures.
I also tried a variation for the non-/ post or page branch to get the ID in advance and reduce the number of queries for top level pages by one compared to trunk. For a full table of query counts in some quick testing see http://pastie.org/1813700.
Performance tests I made before I added the OR branches to deal with post attachment trackbacks, feeds and comment pages showed that with the plugin response time remained the same as the number of pages grew (0.17s at both 59 pages and 1297 pages). Without the plugin it was a different story. 0.18s response time at 59 pages and 0.37s at 1297 pages. The figures I'm quoting are from tests with one concurrent user over 30 seconds requesting the comment feed of a post attachment (e.g. http://wp.dev/post-with-attachment/attachment-slug/feed/rss/) so very low on the rewrite stack. With the plugin enabled the /%postname%/ permalink structure has the least number of rewrite rules by default.
Also run this against the WordPress tests for WP_Query and WP_Rewrite with no failures.
[1] I thought I had managed a relatively neat solution before realising that I had forgotten the trackback, feeds and comment page rules for post attachments which turns the condition into a bit of a monster!
#17
follow-up:
↓ 18
@
13 years ago
Wouldn't it be better to look for the 'attachment' part? That's how the regexes look.
Also, you can use $wp_rewrite->feeds instead of the hardcoded array.
#18
in reply to:
↑ 17
;
follow-up:
↓ 22
@
13 years ago
Replying to scribu:
Wouldn't it be better to look for the 'attachment' part? That's how the regexes look.
Also, you can use $wp_rewrite->feeds instead of the hardcoded array.
Initial reaction is good point. However, I think it might mean just replacing it with 5 regular expressions since ^[^/]+/([^/]+)
could match a request like page-1/page-2/page-3 where page-2 is also an attachment slug (although attachments are supposed to be globally unique this might not be the case see #17170). But you're right, I think there is probably a cleaner way of getting it working if someone thinks about it a bit more and actually tries it.
#20
@
13 years ago
I tested using a single regular expression and then checking the matched portion against attachment post_names instead of the set of OR'd conditions:
if ( preg_match( '#^[^/]+/([^/]+)#', $request['post_or_page'], $matches ) && $id = $wpdb->get_var( $wpdb->prepare( "SELECT ID FROM $wpdb->posts WHERE post_name = %s AND post_type = 'attachment' LIMIT 1", $matches[1] ) ) ) $request['attachment_id'] = $id;
This works well in most cases. But as I had thought there can be problems with conflicting attachment/page slugs. To test try this set-up:
- 3 pages with slugs: page-1, conflict, page-2. Where the succeeding page is a child of the previous
- 1 attachment (attached to a post) with slug conflict (has to be added before the page see #17170)
Now visit /page-1/conflict/ and /page-1/conflict/page-2/ in both cases the attachment is displayed when using the plugin with the above mod. On trunk the pages are displayed (though a notice is triggered for /page-1/conflict/ if the post/attachment is tested by get_page_by_path first, wp-includes.php line 3157). Plugin v002 as uploaded previously will show the attachment for the first but not the second since it will recognise there are three parts the last of which is not 'trackback', a feed or 'comment-page-xx'.
This shouldn't be a problem if attachments were guaranteed to have globally unique slugs. If this were the case then the single regular expression might be the way to go.
#21
@
13 years ago
Just thought of being able to use the matches that have already been made by the rewrite rules to avoid extra slug conflicts (like I did in v002). So checking isset($request['tb'])
and the like. I think exploding and counting will be faster, but something that needs to be investigated.
#22
in reply to:
↑ 18
@
13 years ago
Replying to duck_:
But you're right, I think there is probably a cleaner way of getting it working if someone thinks about it a bit more and actually tries it.
I've tried it. See non-verbose-003.php
Edit: Nevermind, it always displays the page. It never displays conflict slug attachments. Also, this is the same case with same slugs for pages and posts. The page always wins.
#24
@
13 years ago
- Milestone changed from 3.2 to Future Release
Didn't make it in before feature freeze last week, revisit 3.3 early.
#25
@
13 years ago
attachment 16687.diff added
Initial WIP patch, This doesn't touch the fact that the url's may not be unique (page and post with the same slug, Attachment with the same slug as a 2nd level page, 2nd level page with a same slug as an attachment, etc). Seems to work in all cases i throw at it.
The priority order for matching items appears to be: Posts > Attachments > Pages
#29
@
13 years ago
How to make verbose page rules less painful.
For the cases that would normally invoke verbose_page_rules, use this mechanism instead:
- Move the page rewrite rule immediately above normal post custom rewrite rules in the rewrite_rules.
- Special case the page rewrite rule to use a callback or something to do the following:
- Split the url along slashes (post_name's can't contain slashes). Reverse number them. So they look like this: page4/page3/page2/page1 (the page you want, page1, is always at the beginning of your list.
- Recursive subquery fun!
select ID from wp_posts where post_name = page1 AND post_parent = ( select ID from wp_posts where post_name = page2 AND post_parent = ( select ID from wp_posts where post_name = page3 AND post_parent = ( select ID from wp_posts where post_name = page4 ) ) )
Note that this is actually pretty fast due to indexing, although if somebody decides to go too deep into the dream state, they could fall into limbo.
Alternative could use joins. Limbo here is a lot deeper down, probably.
SELECT ID FROM wp_posts a JOIN wp_posts b ON a.post_parent = b.ID AND post_name = page2 JOIN wp_posts c ON b.post_parent = c.ID AND post_name = page3 JOIN wp_posts d ON c.post_parent = d.ID AND post_name = page4 WHERE a.post_name = page1
- If this post exists, check its post_type for page and set vars accordingly. If post_type is attachment, check parent's type instead (to handle page attachments).
- If post does not exist, then fail and get on with life.
Advantage: This adds one extra (ugly) query to the case of using verbose page rules, and doesn't require all the verbose rules to be so verbose. It moves the performance issue to this one query only, and the performance issue here is keyed to the depth of nested pages instead of to the total number of pages. Log N improvement for most all cases.
#30
@
13 years ago
By using get_page_by_path, I was able to implement this basic idea much simpler. I don't know how efficient get_page_by_path actually is though, so I can't speak as to performance.
use_verbose_page_rules still exists, but with this it doesn't create huge rewrite rulesets anymore.
See no_verbose_ver1.diff.
#31
@
13 years ago
- Cc johan.eenfeldt@… added
Very nice, good work!
Tested performance of no_verbose_ver1.diff. Time is average of 5 tests, run on a decent VPS.
Site 1 (72 pages & 3714 attachments) WP 3.2.1
- rewrite rules: 5638 (666KB serialized)
- flush_rewrite_rules(): 1.452s
- failed url_to_postid(): 0.052s
Site 1 WP 3.2.1 + no_verbose_ver1
- rewrite rules: 72 (6,5KB serialized)
- flush_rewrite_rules(): 0.005s
- failed url_to_postid(): 0.005s
Site 2 (27 pages & 1350 attachments) WP 3.2.1
- rewrite rules: 15418 (1,8MB serialized)
- flush_rewrite_rules(): 9.068s
- failed url_to_postid(): 0.150s
Site 2 WP 3.2.1 + no_verbose_ver1
- rewrite rules: 72 (6,6KB serialized)
- flush_rewrite_rules(): 0.005s
- failed url_to_postid(): 0.005s
Do you want anything else tested or profiled?
#32
@
13 years ago
Basically, this patch makes page rules not really "verbose" anymore, and so the verbose flag is now just used to determine if pages should take precedence or not and whether extra queries need to be done for that case.
So also be sure to check for unexpected side effects. Make sure that Pages and Attachments all show up properly and aren't throwing 404's or errors or what have you.
Also be aware than in the event of a URL conflict (such as if you use %postname% for the custom string and a page and post both have the same slug), then the page is going to take precedence and get shown instead of the post. Me and Nacin talked about that Monday and decided that that made the most sense. So be sure that works properly too on your sites.
With the current patch, it's doing unnecessary extra data retrieval and then discarding it. This can be optimized away, although I don't think it would result in noticeable speed improvement. One thing that would be interesting to see is the results on long hierarchical page URLs. The speed of the lookup is dependent on the depth of the URL. So a page with /page would be found faster than a page with /page/url/is/really/really/deep/here/oh/noes. It might be interesting to see how deep you need to go before it becomes a problem.
#33
@
13 years ago
The patch has a bug in comment feeds for pages and attachments (/some/page/feed). You get either 404 or redirected to the actual page/attachment. If there is a post with same name as page you get comments for the post.
The rest seem to work. I get the correct stuff for a random selection of posts, pages, attachments and categories. Pages get precedence over posts. Post, post comment and category feeds work.
Performance for page lookup:
- depth 1: 0.009s, 11 queries (was 0,055s and 5 queries)
- depth 5: 0,021s, 47 queries (was 0,055s and 17 queries)
- depth 10: 0,034s, 92 queries (was 0,056s and 32 queries)
- depth 15: 0,049s, 137 queries (was 0,065s and 47 queries)
- depth 20: 0,062s, 182 queries (was 0,076s and 62 queries)
Seems reasonable for any even close to sane depth. A number of queries but they seem fast enough.
#41
@
13 years ago
Version 2 of patch uploaded. This uses a slightly more complex page query test which handles the feed problems reported by johanee.
In testing with a post with the same slug as a page while using a /%postname%/ permalink string, the page and page feeds took precedence over the post and post feeds. I believe this is the desired behavior in such a case.
Number of extra queries are not yet optimized in this patch, however they are still reasonable for sane page depths and *way* faster than the verbose rules.
#42
@
13 years ago
Version 3 of the patch creates a new find_page_by_path function to find the page given the path, using only one query.
Also optimized get_page_by_path to use this new function, which makes it actually result in less queries when retrieving deep pages than it was using before.
johanee: If you want to test this with your previous data set to get numbers or find bugs, I'd appreciate it.
#44
@
13 years ago
Tested with 1000 pages and /%postname%/ permalinks. Trunk code resulted in 2045 queries on the permalink settings page, and a "MySQL server has gone away" message when trying to save the resulting rewrite array. Page requests thus took 2000+ queries.
With the ver3 patch, rewrite array was 71 rules. Page requests took 17 queries.
@
13 years ago
Take 3 - optimizes the query to make it only one extra query for any number of levels
#45
@
13 years ago
Coding style: Instead of:
if ( $wp_rewrite->use_verbose_page_rules == true && preg_match('/pagename=\$([^&\[]+)\[([0-9]+)\]/',$query,$varmatch) ) {
you can just write:
if ( $wp_rewrite->use_verbose_page_rules && preg_match('/pagename=\$([^&\[]+)\[([0-9]+)\]/',$query,$varmatch) ) {
and it's pretty tricky to figure out what this line does:
if ( ! get_page_by_path(${$varmatch[1]}[$varmatch[2]]) )
More importantly, why does this block of code appear twice:
if ( $wp_rewrite->use_verbose_page_rules == true && preg_match('/pagename=\$([^&\[]+)\[([0-9]+)\]/',$query,$varmatch) ) { // this is a verbose page match, lets check to be sure about it if ( ! get_page_by_path(${$varmatch[1]}[$varmatch[2]]) ) continue; }
If it's really needed in two places, it should be factored into a helper function.
#46
@
13 years ago
A lot of the parse_request function is reproduced in url_to_postid. That's a refactor for another time, I think.
The line of code you're calling tricky is, indeed, tricky. It's basically looking at a string like this:
index.php?pagename=$matches[1]&page=$matches[2]
And trying to extract the "$matches[1]" from that string in a way that it can then figure out how to get the value in $matches[1]. Simplification would be gladly accepted, I banged my head for a while on that one.
#47
@
13 years ago
A lot of the parse_request function is reproduced in url_to_postid. That's a refactor for another time, I think.
Yeah, that's what the next guy says and it gets worse and worse.
Simplification would be gladly accepted, I banged my head for a while on that one.
Ok, I'll sleep on it and see if I dream up a better solution. :)
#48
follow-up:
↓ 49
@
13 years ago
What I mean is that existing duplication doesn't justify additional duplication.
#49
in reply to:
↑ 48
@
13 years ago
I get ya, but I think that removing that duplication is best done in a separate ticket.
#52
@
13 years ago
- move duplicated code into a new WP_Rewrite::find_match() method
- simplify secondary regex in find_match()
#54
follow-up:
↓ 55
@
13 years ago
[18541] broke get_page_by_path()
for non-English slugs: ticket:10249:42.
#55
in reply to:
↑ 54
@
13 years ago
- Keywords needs-testing added
Replying to SergeyBiryukov:
[18541] brokeget_page_by_path()
for non-English slugs: ticket:10249:42.
That's right, related PHP warnings which I get:
NOTICE: wp-includes/post.php:3175 - Undefined index: 0NOTICE: wp-includes/post.php:3171 - Trying to get property of non-objectNOTICE: wp-includes/post.php:3178 - Trying to get property of non-object
I think [18627] is related to this ticket.
#56
@
13 years ago
That's right, related PHP warnings which I get:
Those are somewhat unrelated, same function, but not the cause there. See #17670 for fixing those. (The changes here did increase the number of notices though)
#57
follow-up:
↓ 58
@
13 years ago
So obviously /%postname%/ gets improved. What about /%category%/%postname%/? You could easily detect if first segment is category and if yes, look for entry, if not check page.
#58
in reply to:
↑ 57
@
13 years ago
Replying to rockstyle:
So obviously /%postname%/ gets improved. What about /%category%/%postname%/? You could easily detect if first segment is category and if yes, look for entry, if not check page.
The patch that has already been committed accounts for *all* cases. Not just the postname case.
#62
@
13 years ago
- Cc landwire added
Hi all,
as you seem to be the experts on this field, I come straight to you:
Is there a possibility of adding either of those two structure tags %userrole% and %posttype% to create permalinks? Or is this an utterly stupid idea?
It would help me a lot making the following permalink structure possible, especially if you have 1000+ authors it would be nice to get them structured according to their role:
%role%/%author%/%posttype%/%postname%
I appreciate any answers. If I know it is an absolute silly idea, then I can at least drop it!
Thanks for doing all this magic with wordpress,
Sascha
#63
@
13 years ago
Yes, it should be possible, but trac is not the right place for asking such questions.
Please try the support forums or WordPress Answers.
Suggestion:
and for /%category%/%postname%/:
limit 1 is, arguably, only useful only if we don't want to double check the result.