Make WordPress Core

Opened 8 years ago

Last modified 3 years ago

#36292 assigned feature request

Rewrites: Next Generation

Reported by: rmccue's profile rmccue Owned by: rmccue's profile rmccue
Milestone: Future Release Priority: normal
Severity: normal Version:
Component: Rewrite Rules Keywords:
Focuses: Cc:

Description

Let's start improving rewrites. The current rewrite system is outdated and clunky for modern development, and was one of the most complained about pieces of WordPress when I enquired on Twitter. Here's what I'd like to change.

Part 1: Separate rules from WP Query

This has two parts. Firstly, we need to ensure any rule can be mapped to any set of arbitrary query parameters. Right now, this is quite hard to do for the more complex query parameters, such as date query. Doing this will eliminate the need for temporary/intermediate query vars; these are typically used temporarily during the rewrite process, then changed into actual query vars in pre_get_posts.

The second part of this is breaking the interdependency between rewrites and the main query. Currently, it's not possible to have a page that doesn't generate a main query. #10886 has attempted to tackle this in the past, however has run up against problems. This is important for things like user profile pages, and can immediately be used in core for the REST API. The current solution for these is to hook in at parse_request, but this causes some hooks not to be run, and changes the overall flow of a WordPress request.

Part 1 will introduce a Rewrite Rule Interface that will allow better customisability of rules as well.

Part 2: Ensure the rewrite process is DRY and testable

This part will decouple the rewrite process from the global state to ensure testability. @ericlewis has made a fantastic start on this in #18877, which works well with some of the refactoring from part 1.

In separating from the global state, we may be able to reuse WP_REST_Request as a more generic WP_HTTP_Request (it was designed for this from the start).

This should also start cleaning up the WP and WP_Rewrite classes as concerns get a cleaner separation. Currently the boundary between the two is pretty unclear.

Part 3: Rework flushing

The rewrite_rules option and the flushing process are one of the toughest bits of the rewrite process. The fact with these is that they're a caching process that significantly predates transients and object caching. The reason flushing exists is to avoid regenerating expensive rules (like one-rule-per-tag) on every request, but we can create a better system using the newer tools available in core. An opt-in system for rewrite caching could potentially work well once we have a solid Rewrite Rule Interface that allows us to identify and group rules. (For example, a WP_Rewrite_CachedRuleInterface.)

#29118 digs into some past work on this front; automatically flushing rewrites is already possible.

Other Changes

Note that these parts are in a rough order, and get more vague as they go on. We need to incorporate feedback after changes are made, and these changes might need to be reworked or rethought with previous feedback in mind. There are also other changes we may want to make based on feedback from the community.

Attachments (3)

rewrites.diff (10.8 KB) - added by rmccue 8 years ago.
Part 1: Introduce WP_Rewrite_Rule
callback-rule.php (510 bytes) - added by rmccue 8 years ago.
Example of a callback-based wrapper class
just-callback.php (575 bytes) - added by rmccue 8 years ago.
Example of a callback on the 'wp' action if a rule matches

Download all attachments as: .zip

Change History (20)

@rmccue
8 years ago

Part 1: Introduce WP_Rewrite_Rule

#1 follow-up: @rmccue
8 years ago

  • Owner set to rmccue
  • Status changed from new to assigned

rewrites.diff is part 1 of this plan. It introduces a WP_Rewrite_Rule class, as well as a WP_Rewrite_RuleInterface interface. This moves some parts of the rewrite handling to the rule object, which allows substantial benefits.

For example, let's use the example from Roots' post under "A challenging example". This can now be rewritten as a simple class, and avoid the use of intermediary rewrite tags altogether:

<?php

class Custom_Rule extends WP_Rewrite_Rule {
	public function get_query_vars( $matches ) {
		// A map from the timespan string to actual hours array
		$hours = [
			'morning'   => range(6, 11),
			'afternoon' => range(12, 17),
			'evening'   => range(18, 23),
			'night'     => range(0, 5)
		];

		// Get the custom vars, if available
		$customdate = $matches[1];
		$timespan = $matches[2];

		// Get UNIX timestamp from the query var
		$timestamp = strtotime($customdate);

		return [
			'date_query' => [
				[
					'year'  => date('Y', $timestamp),
					'month' => date('m', $timestamp),
					'day'   => date('d', $timestamp)
				],
				[
					'hour'    => $hours[$timespan],
					'compare' => 'IN'
				],
				'relation' => 'AND'
			],
		];
	}
}
// Let's add rules on init
add_action('init', function() {

	$custom_date_format = '([0-9]{4}-[a-z]{3}-[0-9]{2})';
	$timespan_format = '(morning|afternoon|evening|night)';

	// This adds the rule
	add_rewrite_rule(
		"^{$custom_date_format}-{$timespan_format}/?",
		new Custom_Rule(),
		'top'
	);
});

This is substantially simpler, and more powerful. You can do conditional query building in the callback easily if you want, or map to any arbitrary set of query vars.

If you're not using a query at all, you can set $rule->skip_main_query = true, an alternative to patches proposed in #10886. In addition, you can subclass and override should_skip_main_query() to return false, allowing complex handling here too.

Performance

The main concern here is with performance. I've profiled this with 1000 extra rules and measured both timing and memory. The current patch takes 69% longer (9ms vs 16ms) and uses 176% more memory (512KB vs 1415KB), however this is due to the current implementation, which converts string-based rules to objects for ease of internal handling. We can instead use just-in-time conversion after the pattern matching, which reduces this significantly; updated patch coming soon.

Last edited 8 years ago by rmccue (previous) (diff)

#2 @rmccue
8 years ago

By the way, I'm studying both our library (hm-rewrite) as well as Cortex (mentioned in the Roots article); I'd love to know about any and every rewrite library that exists.

(Once we've completed part 2, I'd also like to look at potentially reworking the internals to use either FastRoute or a similar approach, but I want to walk/crawl first. ;) )

#3 in reply to: ↑ 1 @rmccue
8 years ago

Replying to rmccue:

this is due to the current implementation, which converts string-based rules to objects for ease of internal handling. We can instead use just-in-time conversion after the pattern matching, which reduces this significantly; updated patch coming soon.

I profiled this (tool source), and it comes out to a negligible difference over the non-object implementation (as is expected). However, this is potentially a temporary gain, as one would expect over time that code would generally switch to object-based.

Last edited 8 years ago by rmccue (previous) (diff)

#4 @giuseppe.mazzapica
8 years ago

I really like the effort on this.

As the guy who wrote the article you mentioned and Cortex (the library mentioned in the article) as well as one who is exploring the concept since 2 years (in 2014 I released the now abandoned "Clever Rules" https://github.com/Giuseppe-Mazzapica/CleverRules) I have some things to say...

A class for each a rule is overwhelming imo.

I think that would be easier and with less impact, to introduce a new function, maybe named something like add_rewrite_callback.

The fictional code would be something like:

add_action( 'add_rewrite_rules', 'prefix_custom_rules' );

function prefix_custom_rules(WP_HTTP_Request $request) {

    add_rewrite_callback(
        "^([0-9]{4}-[a-z]{3}-[0-9]{2})-(morning|afternoon|evening|night)/?",
        'prefix_date_rewrite_rule',
        'top'
    );
}

As you can see, I passed the yet to come WP_HTTP_Request to a new hook 'add_rewrite_rules' that, passing the HTTP request, easily allows to add rules only under some request conditions, for instance, HTTP method.

The prefix_date_rule function could then be something like:

function prefix_date_rewrite_rule(array $matches, WP_HTTP_Request $request) {
    
    // A map from the timespan string to actual hours array
    $hours = array(
        'morning'   => range(6, 11),
        'afternoon' => range(12, 17),
        'evening'   => range(18, 23),
        'night'     => range(0, 5)
    );

    // Get the custom vars, if available
    $customdate = $matches[1];
    $timespan = $matches[2];

    // Get UNIX timestamp from the query var
    $timestamp = strtotime($customdate);

    return array(
        'date_query' => array(
             array(
                'year'  => date('Y', $timestamp),
                'month' => date('m', $timestamp),
                'day'   => date('d', $timestamp)
	     ),
             array(
                'hour'    => $hours[$timespan],
                'compare' => 'IN'
	     ),
             'relation' => 'AND'
        ),
    );
}

This is quite an edge case, most of the rules won't require all this code.

Maybe, internally the rule callback can be used to generate instances of a route class, but to require user to write a class for each custom rule is far from ideal as a public API.

Regarding skipping the main query, if the callback would return anything that is not an array, then there are no variables to assign to the main query, so you can just skip the main query if the callback return anything but an array.

Or maybe, would be possible to just add_filter( 'skip_main_query', '__return_true' ) inside the callback.

Also note that this approach easily allows for things like:

add_rewrite_callback(
   "^/some/path/here?",
   function(array $matches, WP_HTTP_Request $request) {
       do_something_interesting();
       wp_safe_redirect( home_url() );
       exit();
   },
   'top'
);

or even:

add_rewrite_callback(
   "^/some/path/here?",
   function(array $matches, WP_HTTP_Request $request) {
       wp_send_json_success( array( 'cool' => true ) );
   },
   'top'
);

If you combine this with the possibility to add rules being aware of specific conditions in the HTTP request (because add_rewrite_rules hook pass the request object and the rule callback receives the request object as argument), things becomes interesting.

Consider that, since you can distinguish "old" string rules from this new callback rules, you are not forced to convert the old ones, saving time and memory.

And the two systems could live side-by-side without any issue I think.

As a final thought, my Cortex (https://github.com/Brain-WP/Cortex) uses the library FastRoute (by Nikita Popov, one of the PHP devs who contributed more to PHP 7) that is faster and less resources consuming of the WP system because it use a very clever approach described in this blog article http://nikic.github.io/2014/02/18/Fast-request-routing-using-regular-expressions.html

I think would be interesting explore the same concept for WordPress.

Last edited 8 years ago by giuseppe.mazzapica (previous) (diff)

@rmccue
8 years ago

Example of a callback-based wrapper class

@rmccue
8 years ago

Example of a callback on the 'wp' action if a rule matches

#5 follow-up: @rmccue
8 years ago

Replying to giuseppe.mazzapica:

I really like the effort on this.

Thanks for posting your thoughts! Replies inline.

A class for each a rule is overwhelming imo.

I think that would be easier and with less impact, to introduce a new function, maybe named something like add_rewrite_callback.

On the surface, I agree, I'd prefer a straight-up callback too. However, there's a few concerns with that, which is why I made this a class/interface:

  • The current format is a string/array. Callbacks in PHP are strings/arrays (or Closure objects), which means the domain overlaps. This potentially leads to conflicts and a backwards compatibility break, as we can't easily tell the two types apart (i.e. if I have the string set to reset, do I want to call that function, or set an empty query var? It gets worse when you consider array query vars). We _could_ break this, but it's not necessary.
  • We have more than one callback here, and I'd actually like to add a few more. The design of this is based partially around hm-rewrite, which avoids needing conditionals later on in hooks.
  • If we have multiple bits of data, the traditional way in WP-land is to use arrays for that. However, because this is part of the critical path (i.e. must be run on every request), performance is a concern. Arrays actually eat up more memory quite quickly (per nikic, "104 + 96*n for arrays and 128 + 8*n for objects"), so objects are the more lightweight approach here, especially for newer PHP versions.

(See below for potential solution.)

As you can see, I passed the yet to come WP_HTTP_Request to a new hook 'add_rewrite_rules' that, passing the HTTP request, easily allows to add rules only under some request conditions, for instance, HTTP method.

Until we rework how flushing works, conditional registration won't work. I want to do this step-by-step so we can assess as we change, but this may be possible some day. :)

I do like the approach in your next piece of code of passing the Request object in to the callback though. This would allow us to implement per-method rules too through some slight reworking of the current code (replacing get_verbose_page_match with a is_valid).

This is quite an edge case, most of the rules won't require all this code.

Maybe, internally the rule callback can be used to generate instances of a route class, but to require user to write a class for each custom rule is far from ideal as a public API.

Doing it in the internal API might be a bit too polymorphic, but I would like to simplify, so potentially a built-in wrapper could help? See callback-rule.php for how this could work.

Regarding skipping the main query, if the callback would return anything that is not an array, then there are no variables to assign to the main query, so you can just skip the main query if the callback return anything but an array.

Not necessarily true; an empty query could simply be asking for the defaults for WP_Query. get_posts([]) e.g. is an empty query, but just uses the defaults.

Or maybe, would be possible to just add_filter( 'skip_main_query', '__return_true' ) inside the callback.

Potentially, but the aim here is to move away from the paradigm of rule -> query, so I'd like the API to recognise the difference. Having to add a filter is a bit of a pain :)

Also note that this approach easily allows for things like:

...

If you combine this with the possibility to add rules being aware of specific conditions in the HTTP request (because add_rewrite_rules hook pass the request object and the rule callback receives the request object as argument), things becomes interesting.

Running a callback during the parsing stage is a bit too early, but I definitely see the benefit of being able to just register a callback. Ideally, we want this to run on the wp hook.

Added an example of how this can work currently in just-callback.php, but I want to explore this idea a bit more to make it a bit easier.

Consider that, since you can distinguish "old" string rules from this new callback rules, you are not forced to convert the old ones, saving time and memory.

That's possible currently, I mainly did the convert-all-of-them for internal consistency, but it's not strictly required. (See above for problems with differentiating between callbacks and queries though.) See 3 about that. :)

As a final thought, my Cortex (https://github.com/Brain-WP/Cortex) uses the library FastRoute (by Nikita Popov, one of the PHP devs who contributed more to PHP 7) that is faster and less resources consuming of the WP system because it use a very clever approach described in this blog article http://nikic.github.io/2014/02/18/Fast-request-routing-using-regular-expressions.html

I think would be interesting explore the same concept for WordPress.

Absolutely agree, particularly in regards to FastRoute; my original draft of the ticket included that, but I cut it out as it was already getting long. :) Once we tackle part 2, we'll be able to investigate changing the internals of the rewriting system much, much easier.

(If there's anything you disagree with here, please do let me know :D)

#6 in reply to: ↑ 5 ; follow-up: @giuseppe.mazzapica
8 years ago

Replying to rmccue:

The current format is a string/array. Callbacks in PHP are strings/arrays (or Closure objects), which means the domain overlaps. This potentially leads to conflicts and a backwards compatibility break, as we can't easily tell the two types apart (i.e. if I have the string set to reset, do I want to call that function, or set an empty query var? It gets worse when you consider array query vars). We _could_ break this, but it's not necessary.

Per my understanding current format is

add_rewrite_rule(
  "^{$custom_date_format}-{$timespan_format}/?",
  'index.php?somevar=somevalue',
  'top'
);

It means you have to distinguish index.php?somevar=somevalue from a callback. There's no possible overlap, especially if you check for is_callable, because the only way a string can be callable is when it contains a function name, but a function name can't contain = which is required in the old format.

Until we rework how flushing works, conditional registration won't work. I want to do this step-by-step so we can assess as we change, but this may be possible some day. :)

Sure, rework flushing is a priority. This is not the only issue with current flushing system. I perfectly agree that a cache for rules is propably needed, but cache !== storage.

Doing it in the internal API might be a bit too polymorphic, but I would like to simplify, so potentially a built-in wrapper could help? See callback-rule.php​ for how this could work.

Converting public API to internal object, by the way, is a thing that pretty much all the libraries out there (including FastRoute) are doing, and I don't see any real problem in it. By the way, your callback rule class is not that bad.

Not necessarily true; an empty query could simply be asking for the defaults for WP_Query. get_posts([]) e.g. is an empty query, but just uses the defaults.

Yes, but [] !== ''. In other words, if an array is returned (even empty) you can assume this is a query argument array. If anything that is not an array is returned (no matter what it is) you can assume you want to skip main query. This should be pretty straightforward to grasp for users.

Running a callback during the parsing stage is a bit too early, but I definitely see the benefit of being able to just register a callback. Ideally, we want this to run on the wp hook.

Well, the callback is not executed at parsing stage. It is executed when the parsing stage ended. At that point, there are few things that happen. Some of them are query related and the implementation you added to OP already skip them when you choose to skip the main query.

The most important thing that happen before wp in the scenario we are skipping main query is the HTTP header settings. However, if we want to have the maximum flexibility, I think is fine if the rule callback has the possibility to complete control the response, which includes HTTP headers.

In short, I think that if the target is to decouple url from query (which is something I really agree) wp is probably too late.

Regarding the hook wp itself, IMO it is a query-related hook as well. Anything existing out there (plugins, themes) that is using the hook was coded with the current WP implementation in mind, where each frontend request trigger a query.

If this new approach lands in core, a callback that takes complete control over the request has nothing to gain from firing wp, because anything that hooks there is expecting a query, which may not happen.

On the other hand, if the callback returns an array, the query will happen and wp will be fired as usual, and everyone will be happy.

#7 in reply to: ↑ 6 ; follow-up: @rmccue
8 years ago

Replying to giuseppe.mazzapica:

It means you have to distinguish index.php?somevar=somevalue from a callback. There's no possible overlap, especially if you check for is_callable, because the only way a string can be callable is when it contains a function name, but a function name can't contain = which is required in the old format.

Unfortunately, this isn't true. add_rewrite_rule accepts an array, which is sent to add_query_arg to add to index.php; add_rewrite_rule( '/', array( 'MyClass', 'mystaticmethod' ) is currently valid and maps to index.php?0=MyClass&1=mystaticmethod. I doubt anyone is using this, but I'd prefer to avoid overloading the ambiguous parameter for now.

(Also, is_callable can't be used for similar reasons to why we can't use it in add_action/add_filter, which is that it might be used before it can be called.)

Yes, but [] !== ''. In other words, if an array is returned (even empty) you can assume this is a query argument array. If anything that is not an array is returned (no matter what it is) you can assume you want to skip main query. This should be pretty straightforward to grasp for users.

Good point, although right now the rule actually needs to return a string, as this is set to $wp->matched_query; I want to shim this in though, so shouldn't be an issue.

I think explicit might still be better than implicit, but I'll investigate and feel it out to see what it's like to use. :)

Regarding the hook wp itself, IMO it is a query-related hook as well. Anything existing out there (plugins, themes) that is using the hook was coded with the current WP implementation in mind, where each frontend request trigger a query.

The hook's description currently "Fires once the WordPress environment has been set up." which doesn't imply that there's a main query. However, it's not fired on the admin. I think we need to work out exactly what this hook should do and document that. It's a useful hook even for non-query pages as it allows frontend-only events.

I guess a broader question there is whether the admin should continue to be regarded as separate, or just another part of the main WP environment. I suspect the former.

#8 in reply to: ↑ 7 @giuseppe.mazzapica
8 years ago

Replying to rmccue:

Unfortunately, this isn't true. add_rewrite_rule accepts an array

This is why in my first comment I proposed to add another function add_rewrite_callback (or maybe add_rewrite_rule_callback): if it is a different function, than you always expect a callback and no overlap possible.

the hook's description currently "Fires once the WordPress environment has been set up." which doesn't imply that there's a main query. However, it's not fired on the admin. I think we need to work out exactly what this hook should do and document that. It's a useful hook even for non-query pages as it allows frontend-only events.

Well, the description is wrong :) At the moment, the hook wp fires in frontend and in backend but only where a main WP_Query is used. In short, if a main query is there, the wp hook is fired.

This is why for me it is a query-related hook. When I see wp hook I read instead main_query_done. Maybe it was not intended to be like that, but actually it is.

Regarding frontend-only hooks, I think that template_redirect is the most fitting in many cases. It's early enough that headers are not sent and is not fired in admin (or AJAX) requests.

For plugins that needs to do stuff on frontend, I think that template_redirect is almost always the best choice... unless you need to override the url->query->template logic, that is the reason why this issue exists.

I guess a broader question there is whether the admin should continue to be regarded as separate, or just another part of the main WP environment. I suspect the former.

If the url-to-query logic that currently always happen in frontend will ever changed to a more flexible routing system, than would probably make sense to convert admin screens to "protected routes". But we have to get there, first.

#9 follow-up: @MikeSchinkel
8 years ago

I think both of the solutions suggested here might be missing something.

If hooks are used as @giuseppe.mazzapica then unit testing will be more difficult if not close to impossible, and that is Part 2 of @rmccue's requirements (which I definitely agree with.)

OTOH, @rmccue's solution to use an object is a great first step is does not address other issues that the rewrite system has. The biggest problem I've run into with URL routing on client projects is that WordPress's rewrite system is context free and what we've often needed has been context sensitive routes. For example, if you want to have an /about/ URL and also have your categories in the root /politics/ and /sports/ you can't do it with the WordPress Rewrite system as-is. And simply ignoring those use-case requirements ends up giving people who are against WordPress more ammunition to shoot it down.

That said, while FastRoute is excellent engineering by @nikic, he overlooked one of the most obvious optimizations you can do with URLs and that is segment chunking (although I won't take credit for recognizing it; it was Otto who gave me the idea.) So that instead of looking at URLs in groups of 10 you look at segments instead.

What's more, you can subdivide the URLs you need to look at by grouping them based on the number of slash-separated segments they contain. Once we have conceptually divided up by number of segments, we can really optimize matching.

For the root segment for most WordPress sites you will not have that many unique segments; throw them into an array and do an in_array() (or into a string using strpos`) to find the proper route; likely much faster than using groups of 10 regex.

Further, by using URL segments you can divide and conquer. In most cases the total choices for matching a second segment will be a much smaller list to test against.

I have actually implemented this generically for a project but -- as with so many things -- I did not realize that parts of the approach I took were not optimal so I would need to refactor that code in order to use it.

Fortunately with the 'do_parse_request' hook we could do it as a Feature Plugin/Feature Project.

So what do you think? Do you think we can all collaborate on an approach like this?


BTW, I have been trying to get WordPress to improve it's URL routing for 6 years. My first ticket:

My 2nd ticket 5 years ago to at least allow for a workaround that did get included (after much objection from Otto, thank you Andy Skelton!):

Also 5 years ago:

And a couple years ago I have a talk on "Hardcore" URL Routing at a WordCamp:

Suffice to say I have spent a lot of time thinking about URL routing and written a lot of code to implement custom URL routes.

I also argued on wp-hackers at length to no avail back around 6 or 7 years ago that we should be able to load a page w/o calling WP_Query, so I'm totally with you there.

#10 in reply to: ↑ 9 ; follow-up: @giuseppe.mazzapica
8 years ago

Replying to MikeSchinkel:

I think both of the solutions suggested here might be missing something.

If hooks are used as @giuseppe.mazzapica then unit testing will be more difficult if not close to impossible, and that is Part 2 of @rmccue's requirements (which I definitely agree with.)

Why? Can you better explain this? If there will be a callback or an object that receives an array of rules and do the parsing, you can easily unit test that callback or object by passing to it an arbitrary set of rules.

That said, there are 2 ways to collect that array of rules: using a global array (just like dozen of other feaures are implemented in WP) or use a RulesCollection object (that will be stored in a global var anyway).

In both cases you'll be able to also test the code that adds rules. If both collecting code and parsing code will be testable, and pretty easily I have to say, which is the issue here? Maybe an example will help me understanding.

And, by the way, I use hooks in Cortex (https://github.com/Brain-WP/Cortex) and it has an unit tests coverage of 95.57% (https://codecov.io/github/Brain-WP/Cortex?branch=refactoring-fastroute).

OTOH, @rmccue's solution to use an object is a great first step is does not address other issues that the rewrite system has. The biggest problem I've run into with URL routing on client projects is that WordPress's rewrite system is context free and what we've often needed has been context sensitive routes. For example, if you want to have an /about/ URL and also have your categories in the root /politics/ and /sports/ you can't do it with the WordPress Rewrite system as-is. And simply ignoring those use-case requirements ends up giving people who are against WordPress more ammunition to shoot it down.

I'm very agree with this. This is why I proposed a system that adds rules with a callback: using a callback is very easy to pass context to it and implement rule logic based on context. In fact, I proposed that rule callbacks should receive an instance of the yet to come WP_HTTP_Request (that @rmccue mentioned in OP). That object would be enough to provide all the context is needed.

That said, while FastRoute is excellent engineering by @nikic, he overlooked one of the most obvious optimizations you can do with URLs and that is segment chunking (although I won't take credit for recognizing it; it was Otto who gave me the idea.) So that instead of looking at URLs in groups of 10 you look at segments instead.

What's more, you can subdivide the URLs you need to look at by grouping them based on the number of slash-separated segments they contain. Once we have conceptually divided up by number of segments, we can really optimize matching.

For the root segment for most WordPress sites you will not have that many unique segments; throw them into an array and do an in_array() (or into a string using strpos`) to find the proper route; likely much faster than using groups of 10 regex.

Sorry, but segment chunking will absolutely not be an optimization compared to FastRoute (even if it may be an optimization compared to current WP implementation).

Consider, first of all, that FastRoute performs a singular preg_match_all call to find a match among all rules.

Using the "chuncks" approach you have to chunck, then filter, then finally perform matching.

The "chunck" step, will create an array for every rule, and arrays are notoriously memory consuming. That's the less.

Regarding the "filter" step, if a rule contains one or more variable chuncks, and very likely it does, to include / exclude the rule from the "matched rule candidates" you need to perform -at least- one preg_match per rule and each of those preg_match, will consume almost the same time of the singular preg_match_all call that FastRoute performs.

After all rules are filtered, you still have to find the one that matches...

In short, the chuncking approach would be more performant of FastRoute only in the very edge case in which there are just few rules and none of the contains variable parts. In other cases it would be from several to dozens (potentially hundreds) times less performant.

Note that my words are based on experience. In my first attempt to a routing system for WP, that was 2 years ago with Clever Rules (https://github.com/Giuseppe-Mazzapica/CleverRules), I was using the "chunking" approach (very similar to the one you describe). The performance gain I have now using FastRoute is not trivial.

However, I see the benefit of looking at URL segments in term of convenience, because you can use different logic based on, for instance, the root URL chunck.

This is why, in Cortex, the action I use to add rules passes an instance of PSR-7 UriInterface (https://github.com/php-fig/fig-standards/blob/master/accepted/PSR-7-http-message.md#35-psrhttpmessageuriinterface) that gives me the possibility to conditionally add rules. This is possible in Cortex because there I use do_parse_request hook, will not be possible in core until the flushing system will not be refactored, as already discussed in a previous post.

Last edited 8 years ago by giuseppe.mazzapica (previous) (diff)

#11 in reply to: ↑ 10 ; follow-up: @MikeSchinkel
8 years ago

Replying to giuseppe.mazzapica:

Replying to MikeSchinkel:

I think both of the solutions suggested here might be missing something.

If hooks are used as @giuseppe.mazzapica then unit testing will be more difficult if not close to impossible, and that is Part 2 of @rmccue's requirements (which I definitely agree with.)

Why? Can you better explain this? If there will be a callback or an object that receives an array of rules and do the parsing, you can easily unit test that callback or object by passing to it an arbitrary set of rules.

Actually, you can test but you cannot really unit test.

You can do is "integration" testing though, and that's what your test suites in Cortex are doing.

Here is an article that explains the difference.

Of course you can't really unit test anything that has a hook in it, which indicts the majority of the WordPress extension API, and I am fully aware of that.

It is possible that we should use hooks for routing -- because that is how practically everything else in WordPress is extended -- and forgo the benefits of unit testing and be satisfied instead with integration testing in exchange for easier extensibility. Personally I am on the fence about it and could go either way.

The "chunck" step, will create an array for every rule, and arrays are notoriously memory consuming. That's the less.

The existing system uses arrays, and unless we generate code with switch statements -- not going to happen for a user-administerable blogging software -- there is not many others way to handle it routing patterns besides arrays. Also, when you wrote "will create an array for every rule" it implies to me that you may simply not understand what I proposed?

This is why I proposed a system that adds rules with a callback: using a callback is very easy to pass context to it and implement rule logic based on context. In fact, I proposed that rule callbacks should receive an instance of the yet to come WP_HTTP_Request (that @rmccue mentioned in OP). That object would be enough to provide all the context is needed.

As best I can tell from what you proposed the callback is not called until after the regex matches, so unless I misunderstand your suggestion the matches would still be context free.

Maybe your definition of "context" is different than I intended when I used it? The context I meant was that state of the system configuration and/or data currently in the system. For example, if we want Categories in the URL root and we have a sports category slug then that URL would route to the Category Archive instead of routing to a Page with a URL slug of sports. (And with the approach I proposed we'd need to make sure users were warned when path segments became ambigous.)

Consider, first of all, that FastRoute performs a singular preg_match_all call to find a match among all rules.

Are you absolutely certain, without-a-doubt that the above statements of yours that I quoted is correct?

I assert your statement is not correct and that FastRoute chucks in groups of ~10 rules meaning it may need to make many more than a single preg_match_all() call in order to match a URL. Would you like me to prove my assertion?


P.S. BTW, I set up a test with Cortex and one thing I noticed was quite a lot of overhead in the autoloading and the creation of all the objects prior to getting to the point of calling Router->match(). Further Router->parseRoutes does a lot of work on each page load that could be cached. That includes RouteParser->parse() doing a preg_split() and RouteParser->parsePlaceholders() doing a preg_match_all() for each route for each page load.

So even if FastRoute only performed a singular preg_match_all call the two extra regex you do for each rule negates your assertions there is only one.

Since I did not profile your code I have no certainty on the amount of overhead all your setup adds but it felt like a lot of overhead as implemented as It took what seemed like forever to trace through it using XDEBUG just to get to Router->match().

I also have not yet profiled my proposed approach for performance either. FWIW.

#12 in reply to: ↑ 11 ; follow-up: @giuseppe.mazzapica
8 years ago

Replying to MikeSchinkel:

Replying to giuseppe.mazzapica:

Replying to MikeSchinkel:

I think both of the solutions suggested here might be missing something.

If hooks are used as @giuseppe.mazzapica then unit testing will be more difficult if not close to impossible, and that is Part 2 of @rmccue's requirements (which I definitely agree with.)

Why? Can you better explain this? If there will be a callback or an object that receives an array of rules and do the parsing, you can easily unit test that callback or object by passing to it an arbitrary set of rules.

Actually, you can test but you cannot really unit test.

You can do is "integration" testing though, and that's what your test suites in Cortex are doing.

Here is an article that explains the difference.

Believe me, I know exactly what unit tests are. I wrote a library (http://brain-wp.github.io/BrainMonkey/) for the very sole purpose of unit testing WordPress plugins.

You'll be surprised that my plugins are the very few open source WordPress plugins on the entire web that are tested without loading WordPress. Every WordPress class and function and hook is mocked.

No, they are not integration tests.

Moreover, you might find interesting my answer on WPSE http://wordpress.stackexchange.com/questions/164121/testing-hooks-callback/164138#164138 where I talk just about unit testing & WordPress.

Of course you can't really unit test anything that has a hook in it, which indicts the majority of the WordPress extension API, and I am fully aware of that.

Using my Brain Monkey, I actually can.

As best I can tell from what you proposed the callback is not called until after the regex matches, so unless I misunderstand your suggestion the matches would still be context free.

Maybe your definition of "context" is different than I intended when I used it? The context I meant was that state of the system configuration and/or data currently in the system. For example, if we want Categories in the URL root and we have a sports category slug then that URL would route to the Category Archive instead of routing to a Page with a URL slug of sports. (And with the approach I proposed we'd need to make sure users were warned when path segments became ambigous.)

In my first comment here (https://core.trac.wordpress.org/ticket/36292?replyto=11#comment:4) I proposed to provide context when adding hooks (via the introduction of a new hook add_rewrite_rules that pass an instance of request object) and to pass context route callbacks.

Reason is that storing rules in DB, like WP does now, conditional rule addition is not possible at all. If the flushing system will be refactored (as I hope) a cache layer should probably implemented, and conditional addition will break cache.

So, when a route is slightly different based on context, is probably convenient just ad it and then handle differences while performing route action, which is
easy and straightforward if route action is a callback that receives context as argument.

When differences are bigger, you can use the context to conditional adding rules, assuming that flushing system is refactored.

P.S. BTW, I set up a test with Cortex and one thing I noticed was quite a lot of overhead in the autoloading and the creation of all the objects prior to getting to the point of calling Router->match(). Further Router->parseRoutes does a lot of work on each page load that could be cached. That includes RouteParser->parse() doing a preg_split() and RouteParser->parsePlaceholders() doing a preg_match_all() for each route for each page load.

Yes, Cortex approach is not optimized. A system based purely on FastRoute would be much (much) faster.

The counter-optimization of Cortex has different reasons. Among others, operating outside of WordPress core and wanting to have 100% core compatibility, I have to include some additional overhead.

Moreover, FastRoute have no system to differentiate routes based on domain and have no way do define "priority" for routes nor to define route "groups".

FastRoute is committed to speed, so it reduces features to avoid overhead, but I decided that some overhead is an acceptable trade off to have those features.

Even because I did some very basic profiles comparing Cortex with WordPress native implementation and noted that, in real world use cases, there is close to zero performance loss and even in worst case scenario the performance loss is so trivial that is negligible compared to the whole WordPress environment bootstrap process.

(In the "Clever Rules" implementation wehre I used the URL chunking approach performance loss was much bigger).

So, by now, the flexibility and general benefits I get from Cortex exceed by far the small performance issue.

Finally, consider that Cortex is not stable yet, and one of the reasons is that there is no cache layer yet and so route parsing / matching happens on every request which, of course, is suboptimal.

However, I use it in production in quite big sites (sorry, I can't share it publicly, but if you are intersted I could privately) and the performance impact is unnoticeable, lost in the much bigger "trouble" of WordPress bootstrap.

This ticket was mentioned in Slack in #core by rmccue. View the logs.


8 years ago

#14 in reply to: ↑ 12 @MikeSchinkel
8 years ago

Replying to giuseppe.mazzapica:

You'll be surprised that my plugins are the very few open source WordPress plugins on the entire web that are tested without loading WordPress. Every WordPress class and function and hook is mocked.

The problem with that approach is that is makes the assumption that all of your mocks behave exactly as WordPress would behave, and given the combinatorial amount of functions and hooks that is a statistical impossibility especially when you consider that behavior changes with each release of WordPress.

Yes, what you have is better than not having any testing -- by far -- but it is still far from perfect.

As best I can tell from what you proposed the callback is not called until after the regex matches, so unless I misunderstand your suggestion the matches would still be context free.

Maybe your definition of "context" is different than I intended when I used it? The context I meant was that state of the system configuration and/or data currently in the system.

You are correct, you are using your definition of context in this context (no pun intended.) My definition of "context" here is the context of the URL segments when compared to architected structure of the site and of the values persisted in the database.

For example, with context-free URLs if the URL path is /about/ then we both would probably assume a $post_type='page' with a slug of about.

However, with context-sensitive URLs that I am advocating for URLs that might not be able to be determined by merely looking at them. Instead the router would need to determine which post or taxonomy term those URLs represented, or if they even represented data in the database. Maybe /api/ would represent a full code-generated response.

For example, if we want Categories in the URL root and we have a sports category slug then that URL would route to the Category Archive instead of routing to a Page with a URL slug of sports. (And with the approach I proposed we'd need to make sure users were warned when path segments became ambiguous.)

Yes, but I am advocating for more than that. For example, what of URL path's /barack-obama/, /obamacare/, /google/, /mercedes-benz/ or /giuseppe-mazzapica/? With only context-free URLs we know that those would look up those slugs using $post_type='page'.

But with context-sensitive URLs the URLs above might represent, respectively, a person type post, a news category, a company type post, an automaker type post and a user account. It is currently possible to do this in WordPress; most of the sites I work on do this.

However, it is currently a hack to do this in WordPress. I think it should be built into the core of WordPress to enable this, and then for core to optimize the 80% case as best possible.

Yes, in some cases the above would cause slow performance, but so does a poorly written SQL query. Just as the SQL developer should be responsible for ensuring their queries are fast or cached so should the site architecture/developer be responsible to ensure that sites that need to scale to huge amounts of content either do not have URLs that cause issues or they optimize those URLs.

For example, if a site has 1000 categories -- bad practice thought it probably is -- those category slugs could be loaded into memcache and scanned rapidly for a match before moving on to other potentials.

In my first comment here (https://core.trac.wordpress.org/ticket/36292?replyto=11#comment:4) I proposed to provide context when adding hooks (via the introduction of a new hook add_rewrite_rules that pass an instance of request object) and to pass context route callbacks.

Certainly, but we were not discussing the same thing, as described above.

So, when a route is slightly different based on context, is probably convenient just ad it and then handle differences while performing route action, which is easy and straightforward if route action is a callback that receives context as argument.

Adding routes into code makes it effectively impossible to get the reciprocal unless people write both sets of code, which many people will not (even realize them need to.) Better to build a system that allows us to find out that /barack-obama/ is $post_id=123 AND that $post_id=123 has a URL of /barack-obama/; we should be able to derive that from the system from declarative information, not with hooks.

FastRoute is committed to speed, so it reduces features to avoid overhead, but I decided that some overhead is an acceptable trade off to have those features.
...
Consider, first of all, that FastRoute performs a singular preg_match_all call to find a match among all rules.

You avoided my response to your statement that "segment chunking will absolutely not be an optimization compared to FastRoute" which you justified by saying "Consider, first of all, that FastRoute performs a singular preg_match_all call to find a match among all rules." And I proved your justification to be false because FastRoute chunks arbitrarily in groups of 10 URLs which means it can end up running many preg_match() calls to match a URL, especially for WordPress sites with at least 100 regex rules and often 200 or more.

Thus I still assert that segment chunking is likely a proper optimization over FastRoute as it is currently implemented.


Last edited 8 years ago by MikeSchinkel (previous) (diff)

This ticket was mentioned in Slack in #core by rmccue. View the logs.


8 years ago

#16 @pcarvalho
7 years ago

Hi everyone, any progress on Rewrites:The Next Gen?

I have a question tho: how would collisions be matched/resolved? (#13459 )

This ticket was mentioned in Slack in #meta by rmccue. View the logs.


3 years ago

Note: See TracTickets for help on using tickets.