Make WordPress Core

Opened 16 years ago

Closed 13 years ago

Last modified 13 years ago

#8958 closed enhancement (duplicate)

Optimize rewrite rule generation

Reported by: matthijs's profile matthijs Owned by:
Milestone: Priority: normal
Severity: major Version: 2.7
Component: Optimization Keywords:
Focuses: Cc:

Description

The problem was introduced after the upgrade from 2.6.5 to 2.7.
The site has a permalink structure of /%category%/%year%/%monthnum%/%day%/%postname%/

After the upgrade the blog runs more then 2400 queries per page load. The site works normal, only very very slow (of course).
Research into the queries being run shows wordpress is trying to insert the rewrite_rules in the wp_options table. Only problem is: the field rewrite_rules contains about 20,000 lines of code/rules, putting that single field in a text file is >1.2 Mb large. So it chokes on that.

Disabling plugins, re-importing the db file, renewing the wp files, nothing helps. Going back to 2.6.5 and then again upgrading reintroduces the problem.

Changing the permalink structure to something else, for example:
/%year%/%monthnum%/%day%/%postname%/

also solves the problem temporarily. As soon as I change back to
/%category%/%year%/%monthnum%/%day%/%postname%/

wordpress again tries to insert the huge query into the field rewrite_rules, which isn't possible.

Looking at the rewrite_rules it seems it repeats the same rules for every page and attachment, like

s:35:\"vacancies/473/attachment/([^/]+)/?$\";
s:32:\"index.php?attachment=$matches[1]\";
s:45:\"vacancies/473/attachment/([^/]+)/trackback/?$\";
s:37:\"index.php?attachment=$matches[1]&tb=1\";
s:65:\"vacancies/473/attachment/([^/]+)/feed/(feed|rdf|rss|rss2|atom)/?$\";
s:49:\"index.php?attachment=$matches[1]&feed=$matches[2]\";
s:60:\"vacancies/473/attachment/([^/]+)/(feed|rdf|rss|rss2|atom)/?$\";
s:49:\"index.php?attachment=$matches[1]&feed=$matches[2]\";
s:60:\"vacancies/473/attachment/([^/]+)/comment-page-([0-9]{1,})/?$\";
s:50:\"index.php?attachment=$matches[1]&cpage=$matches[2]\";
s:28:\"(vacancies/473)/trackback/?$\";

Interestingly:

1- when I do an export of the posts to the xml file, install a fresh wp and then re-import the posts, the problem goes away. In that case, the field rewrite_rules, suddenly "only" contains around 5000 rewrite_rules.

2 - all the rules seem to relate to (older) versions of the pages and attachments
So maybe the problem is that wordpress is writing rules for every page revision and every page attachment it has ever had. The website is not that big, but with a few hundred pages, a few hundred attachments it seems to go into the 20,000 rules with this specific permalink structure. That could also explain that when I do the import of xml in a fresh install, there are no page revisions anymore and the rewrite rules are not that many.

I cannot imagine it is something else, because besides the fact that that one query is too large to be run, everything is fine. Database tables are in good condition, files are correct, I tried it on different servers, etc so I can almost certainly rule out another causes. Besides the fact that wp cannot run the query to insert the rewrite_rules, everything works fine, front- and backend.

Attachments (2)

reduce_attach_query.8958.diff (1.4 KB) - added by robinmarshall 15 years ago.
Patch as described in comment 14
8958.diff (2.8 KB) - added by Denis-de-Bernardy 15 years ago.
do the same thing in two queries, instead of two queries per page and one per attachment plus two

Download all attachments as: .zip

Change History (42)

#1 @westi
16 years ago

  • Keywords reporter-feedback added

You have to have verbose rules because you have %category% at the beginning of your permalink structure. See #3614)

In my local install with %category% at the start of my permalinks I get verbose rules but not ones for all the revisions.

Could you supply the rules you see generated for a revision?

#2 @matthijs
16 years ago

Maybe it's not for the revisions. Looking at the rules, it seems all the attachments are the problem. Each one has about 11 rules (22 lines), like:

s:35:\"vacancies/474/attachment/([^/]+)/?$\";
s:32:\"index.php?attachment=$matches[1]\";
s:45:\"vacancies/474/attachment/([^/]+)/trackback/?$\";
s:37:\"index.php?attachment=$matches[1]&tb=1\";
s:65:\"vacancies/474/attachment/([^/]+)/feed/(feed|rdf|rss|rss2|atom)/?$\";
s:49:\"index.php?attachment=$matches[1]&feed=$matches[2]\";
s:60:\"vacancies/474/attachment/([^/]+)/(feed|rdf|rss|rss2|atom)/?$\";
s:49:\"index.php?attachment=$matches[1]&feed=$matches[2]\";
s:60:\"vacancies/474/attachment/([^/]+)/comment-page-([0-9]{1,})/?$\";
s:50:\"index.php?attachment=$matches[1]&cpage=$matches[2]\";
s:28:\"(vacancies/474)/trackback/?$\";
s:37:\"index.php?attachment=$matches[1]&tb=1\";
s:48:\"(vacancies/474)/feed/(feed|rdf|rss|rss2|atom)/?$\";
s:49:\"index.php?attachment=$matches[1]&feed=$matches[2]\";
s:43:\"(vacancies/474)/(feed|rdf|rss|rss2|atom)/?$\";
s:49:\"index.php?attachment=$matches[1]&feed=$matches[2]\";
s:36:\"(vacancies/474)/page/?([0-9]{1,})/?$\";
s:50:\"index.php?attachment=$matches[1]&paged=$matches[2]\";
s:43:\"(vacancies/474)/comment-page-([0-9]{1,})/?$\";
s:50:\"index.php?attachment=$matches[1]&cpage=$matches[2]\";
s:28:\"(vacancies/474)(/[0-9]+)?/?$\";
s:49:\"index.php?attachment=$matches[1]&page=$matches[2]\";

That page (vacancies) currently has only 2 attachments, but apparently wordpress remembers all previous attachments (which are still in the media gallery) and writes rules for all of them.

So this is an expected behavior? Then does that mean that this will not scale to anything behind a few hundred pages or attachments?

#3 @westi
16 years ago

Ok so it looks like we could improve the situation by only build rules for published attachments.

I'll see if I can put something into trunk for you to test with.

In general by having %category% at the beginning of your permalink structure you are going to have a lot more rules because of the way it is matched.

#4 @westi
16 years ago

  • Keywords needs-patch added; reporter-feedback removed
  • Milestone set to 2.8

Looking into this a little more it's not easy I don't think at the moment to tell if a post_attachment is used or not :-(

The offending code in wp-includes/rewrite.php is:

$attachments = $wpdb->get_results( $wpdb->prepare( "SELECT ID, post_name, post_parent FROM $wpdb->posts WHERE post_type = 'attachment' AND post_parent = %d", $id ));

Which pulls all attachment into the mix - attachments have a post_status of inherit at the moment so it's not easy to modify that query to pick out the visible ones.

#5 @matthijs
16 years ago

But it should be able to query for the post status right? I don't know the intricacies of the code, but if you take the above query and take into account the field post_status == publish, it should be possible to get out only the currently active ones, I would think.

Maybe as simple as

WHERE post_type = 'attachment' AND post_parent = %d AND post_status = publish

#6 @DD32
16 years ago

But it should be able to query for the post status right?

As westi said, -All- Attachments have the post_status of 'inherit' right now, So its impossible to separate visible from non-visible attachments.

#7 @matthijs
16 years ago

So then the conclusion would be that not all attachments should get the status of 'inherit'? It seems like the current situation doesn't scale at all. I wouldn't call a website with a few hundred posts/attachments big.

Or are there other solutions?

Why is it necessary to have rewrite rules for all media files? Wouldn't it make more sense to just :

  • upload media to folders
  • link to them

I thought the whole point of the htaccess rewrite rules in wordpress is "unless the file exists, channel the request through index".

#8 @ryan
16 years ago

Having %category% or %postname% at the beginning of the permalink structure means either including explicit rewrite rules for all pages or doing extra queries when resolving permalinks to determine if a page is being requested.

#9 @Denis-de-Bernardy
15 years ago

  • Keywords dev-feedback added; permalinks rewrite_rules revisions removed

couldn't this:

$attachments = $wpdb->get_results( $wpdb->prepare( "SELECT ID, post_name, post_parent FROM $wpdb->posts WHERE post_type = 'attachment' AND post_parent = %d", $id ));

be replaced, by code that looks something this before the foreach loop:

$all_attachments = $wpdb->get_results("SELECT * FROM $wpdb->posts WHERE post_type = 'attachment');
update_post_caches($all_attachments);

and then in the foreach loop:

$attachments = get_children(...)

#10 @Denis-de-Bernardy
15 years ago

the query could then be optimized with a join on posts as parents where parents.post_type = page and parents.post_status = published, too.

#11 @Denis-de-Bernardy
15 years ago

  • Milestone changed from 2.8 to Future Release

Moving to future, from lack of patch.

#12 @justinph
15 years ago

A potential and temporary workaround is to insert some other string into the permalink structure. This is the permalink structure I'm using that seems to avoid the problems with this bug:

/view/%category%/%postname%/

And for the category base, use "list". Seems to make fairly human-understandable URLs, too.

#13 @Denis-de-Bernardy
15 years ago

that work work for very long, though. there is a separate ticket that highlights that as an issue, the patch is in the commit candidates.

#14 @robinmarshall
15 years ago

  • Cc robinmarshall added

What about just getting the id's of pages with attachments then checking if the page id is in the list before running the query?

e.g.

$all_attachments = $wpdb->get_results("SELECT post_parent FROM $wpdb->posts WHERE post_type = 'attachment'");
		
foreach ($posts as $id => $post) {
	if (!in_array($id,$all_attachments)) continue;

#15 @Denis-de-Bernardy
15 years ago

might work too. the keyword, at the end of the day, is needs-patch. it won't get fixed unless someone gives it some love.

@robinmarshall
15 years ago

Patch as described in comment 14

#16 @robinmarshall
15 years ago

  • Keywords has-patch needs-testing added; needs-patch removed

#17 @Denis-de-Bernardy
15 years ago

  • Milestone changed from Future Release to 2.8

#18 @ryan
15 years ago

I think $all_attachments needs to use get_col() instead of get_results().

#19 @Denis-de-Bernardy
15 years ago

  • Keywords needs-patch added; has-patch needs-testing dev-feedback removed

indeed it should. but the patch is clunky. there is a faster way to get this all in one go.

@Denis-de-Bernardy
15 years ago

do the same thing in two queries, instead of two queries per page and one per attachment plus two

#20 @Denis-de-Bernardy
15 years ago

  • Keywords has-patch needs-testing added; needs-patch removed

@robin: please give the new patch a try

#21 @Denis-de-Bernardy
15 years ago

  • Milestone changed from 2.8 to Future Release

punting...

#22 follow-up: @onezero
15 years ago

To test this patches I used the result of "SELECT LENGTH(option_value) FROM wp_options WHERE option_name LIKE 'rewrite%'" to benchmark. Along with a walk through the website to notice differences. The installation in which tests are performed has 200 posts, 100 pages, 35 categories, and started out in 2008.

The original situation was the custom permalink setting "/%category%/%year%/%postname%", the result of the query was +/- 1.400.000 chars. After applying patch 8958.diff the result was +/- 1.400.000 chars. After applying patch reduce_attach_query.8958.diff the result was +/- 145.000 chars. A significant difference.

Following the hint of justinph (see above) to use a fixed prefix ("/list/%category%/%year%/%postname%") the result in the original (unpatched) situation is +/- 8.000 chars. An even more dramatic improvement than the results achieved by the fixes, but at a cost: (slightly) different URLs. An equal result was achieved by moving a less variable parameter to the front of the setting: "/%year%/%category%/%postname%"): +/- 7.500 chars.

#23 in reply to: ↑ 22 @Denis-de-Bernardy
15 years ago

  • Component changed from Permalinks to Optimization
  • Keywords tested added; needs-testing removed
  • Owner changed from ryan to Denis-de-Bernardy
  • Status changed from new to accepted

Replying to onezero:

The original situation was the custom permalink setting "/%category%/%year%/%postname%", the result of the query was +/- 1.400.000 chars. After applying patch 8958.diff the result was +/- 1.400.000 chars. After applying patch reduce_attach_query.8958.diff the result was +/- 145.000 chars. A significant difference.

Yes, but you're missing the point. If anything, you've just highlighted that 8958.diff has no bugs, as opposed to reduce_attach_query.8958.diff. The question here is not reduce the amount of rewrite rules, but rather to make sure that they're generated more quickly.

Re the number of rules, it cannot be trimmed because you're using a permalink structure that requires verbose page rewrite rules.

Following the hint of justinph (see above) to use a fixed prefix ("/list/%category%/%year%/%postname%") the result in the original (unpatched) situation is +/- 8.000 chars. An even more dramatic improvement than the results achieved by the fixes, but at a cost: (slightly) different URLs. An equal result was achieved by moving a less variable parameter to the front of the setting: "/%year%/%category%/%postname%"): +/- 7.500 chars.

FYI, this prefix trick should no longer work in 2.8 beta-2. And as you highlight, the rule that starts with /%year%/ is the best -- precisely because it's no longer using verbose page rewrite rules.

Anyway, marking 8958.diff as tested, since it significantly reduces the number of queries needed to generate those rules.

#24 @Denis-de-Bernardy
15 years ago

  • Milestone changed from Future Release to 2.8.1

#25 @ryan
15 years ago

  • Milestone changed from 2.8.1 to 2.9

#26 @Denis-de-Bernardy
15 years ago

  • Owner Denis-de-Bernardy deleted
  • Status changed from accepted to assigned

#27 @scribu
15 years ago

  • Summary changed from Huge amount rewrite rules for page revisions after upgrade to Optimize rewrite rule generation
  • Type changed from defect (bug) to enhancement

#28 follow-up: @ryan
15 years ago

  • Milestone changed from 2.9 to Future Release

#29 in reply to: ↑ 28 @greggo
15 years ago

Replying to ryan:

So what can one expect when upgrading to 2.9? Will the 8959.diff patch still work? Anyone try this out yet?

#30 @scribu
15 years ago

  • Keywords health-check added; has-patch tested removed
  • Milestone changed from Future Release to 3.0

#31 @Denis-de-Bernardy
15 years ago

  • Keywords has-patch tested added

#32 @hakre
15 years ago

  • Keywords needs-patch added; has-patch tested health-check removed

Patch does not apply clean any longer against head in trunk.

#33 @nacin
15 years ago

  • Keywords health-check added

#34 follow-up: @hakre
15 years ago

is health-check for those tickets where the patches do not apply clean any longer?

#35 in reply to: ↑ 34 @nacin
15 years ago

Replying to hakre:

is health-check for those tickets where the patches do not apply clean any longer?

No, that's what needs-patch is for :-P

health-check is a keyword that westi has been assigning to tickets relating to a health check of sorts for a WP install and the server it is running on. I imagine this will begin to pick up steam as a big new feature in 3.0. See also #11466 and a few others (I guess you could search for health-check, kind of the idea).

#36 @Denis-de-Bernardy
15 years ago

@hakre: it's mostly a marker so that we remember what to highlight to users in the health check plugin. in this case, the check would tell users that verbose rewrite rules are very bad.

#37 @miqrogroove
15 years ago

  • Milestone changed from 3.0 to Future Release

This item does not appear to be beta ready.

#38 @mikeschinkel
14 years ago

  • Cc mikeschinkel@… added

Consider resolving this issue using this approach instead #12935.

#39 @johnbillion
13 years ago

  • Keywords needs-patch health-check removed
  • Resolution set to duplicate
  • Status changed from assigned to closed

This is being addressed in #16687 (see Otto's comment).

#40 @SergeyBiryukov
13 years ago

  • Milestone Future Release deleted
Note: See TracTickets for help on using tickets.