Make WordPress Core

Opened 4 years ago

Last modified 4 years ago

#48223 new defect (bug)

parse_request(): When request has multiple matching rewrite rules, and matched rule returns 404 - iterate to next rewrite rule

Reported by: apedog's profile apedog Owned by:
Milestone: Awaiting Review Priority: normal
Severity: normal Version:
Component: Rewrite Rules Keywords: has-patch dev-feedback
Focuses: Cc:

Description

A request might have multiple matching rewrite rules.
parse_request() will take the top rule and attempt to parse it. If that rule results in an empty query, is_404() is set.

In the case of a 404 error on the first matched rule, parse_request() should iterate to the next matched rule and attempt the next query.

Example:
A request might match all these rules=>queries:

  1. (.?.+?)(?:/([0-9]+))?/?$ => pagename=$matches[1]&page=$matches[2]
  2. ([^/]+)(?:/([0-9]+))?/?$ => name=$matches[1]&page=$matches[2]

If first query pagename=$matches[1]&page=$matches[2] returns 404, parse_request() should attempt the second query before returning a 404 error.

Attachments (4)

48223.diff (3.1 KB) - added by apedog 4 years ago.
48223.1.diff (6.5 KB) - added by apedog 4 years ago.
48223.1.diff fixes some errors in 48223.diff and replaces it
48223.2.diff (3.1 KB) - added by apedog 4 years ago.
48223.2.diff fixes errors in 48223.diff and 48223.1.diff and replaces them
48223-wp-refactor.patch (37.4 KB) - added by apedog 4 years ago.

Download all attachments as: .zip

Change History (10)

#1 @apedog
4 years ago

Case in point:

A database was created before custom post types existed, has the following hierarchical pages published: recipes and recipes/chocolate-cake
At a later point a Recipe custom post type was registered with archive/front slug 'recipes' and used for all new recipes. However the old pages were not converted to the new custom post types, and are still regular WP pages.

As it stands, the published page recipes/chocolate-cake cannot be reached, neither through the permalink nor using ?p=123 notation. Instead returning a 404 page.

#2 @SergeyBiryukov
4 years ago

  • Component changed from General to Rewrite Rules

@apedog
4 years ago

#3 @apedog
4 years ago

  • Keywords has-patch dev-feedback added

I've written a rough patch that attempts to fix this issue.
The following changes have been made to class-wp.php:

Added variable $all_matched_rules - array of all possible rewrite rule matches. This is unused but persistent.
Added variable $matched_rules - array of matched rules to loop over.

Function parse_request() creates array of all possible rewrite rule matches and pops first rule off $this->matched_rules array, populating $this->matched_rule.

On filter 'pre_handle_404' added a function maybe_try_another_matched_rule() that tests if other possible rewrite rules exist for this request. If so, function parses request and queries posts again for next matched rule available. It does so recursively until $matched_rules array is empty. In which case it defaults to default 404 handling.

This patch is more of a proof-of-concept. And would obviously need some iterations if accepted. The code is convoluted and is redundant in some areas. This can be fixed, with zero damage to logic, by making some additional changes to parse_request().

@SergeyBiryukov @johnbillion

@apedog
4 years ago

48223.1.diff fixes some errors in 48223.diff and replaces it

@apedog
4 years ago

48223.2.diff fixes errors in 48223.diff and 48223.1.diff and replaces them

#4 @apedog
4 years ago

So in attempting to write a patch for this ticket I ended up having to re-factor class WP. So the attached patch is a little "bigger" than just a patch for this ticket and might justify opening a separate ticket for.


I've added a patch (48223.wp-refactor.patch) that does multiple things.

  • Refactors class WP
  • Introduced a new class Request_Environment
  • Added code to allow parsing multiple rules if the first one returns empty/404

Request_Environment

Request_Environment is the same as the old WP environment class but is decoupled from $_SERVER, $_GET, $_POST, global $wp_the_query etc.
It does not send headers. It does not set globals.

Request_Environment can ideally be used within a virtual WordPress environment, to parse any request string, without setting globals, without sending headers and using global $wp_query.

It does not accesses $_SERVER and global $wp_query directly, but instead uses instance variables $this->server, $this->wp_query etc.

It introduces a new method setup_request_environment() that accepts $_SERVER (or a similarly formatted array), $_GET, $_POST (or equivalent array), a WP_Query object and a WP_Rewrite instance. And stores them to local/instance variables.

Refactor class WP

WP is now an extension of Request_Environment.
It uses the new setup_request_environment() method to populate instance variables with actual globals $_SERVER, $_GET, global $wp_query, $wp_rewrites.
It also sends headers and sets globals after parsing is done as before.

All methods that were available before are still available now. Either in class WP or in class Request_Environment.

All hooks that were available before are still available now. Except some minor changes detailed below.

parse_request()

In order to better work with the WP environment object and parse_request() method, I've broken it up into 3 separate chunks/methods - detailed in the section below.

Multiple Rules Matching

In order to accomodate multiple rule matching (the main impetus behind this exercise), I've had to break up parse_request() into smaller more managable chunks:

  • _match_rewrite_rules(): gets all rules that match request. Does not break after first match and keeps them in an array.
  • _match_single_rule(): same algorithm as before.
  • _parse_request(): actual parsing of the current rule being matched. same as before.

I've added a hook 'match_multiple_rules' and a new function _reparse_request().
If match_multiple_rules is set to true, and the first query_posts() returns empty or 404, _reparse_request() will clear the WP_Query object, will parse the next matching rule and query_posts again.

Notable Changes

I've changed the order of query_posts() and send_headers(). So headers are only sent after query_posts ( and reparse_request ) have run. This means some hooks will now fire in a slightly different order. This might cause problems in some rare edge-cases. But I believe this is a minor change that should not break any existing installaiton or plugin.
But if testing proves this assumption wrong - reverting it shouldn't be too hard to do. It will just require slightly repetitive code.

unset($_GET['error']) now executes inside setup_globals() - after _reparse_request() finishes - and not during parse_request() as before.


This might need some further iterations. But it generally works. I'd love to get some feedback on this. And am willing to shepherd this ticket and make changes if necessary. I know this might be/seem to be a big change so any and all feedback is appreciated.

@SergeyBiryukov I know you've made some changes to class-wp recently. This applies cleanly to that AFAICT. But obviously this needs scrutiny.
@johnbillion - You've expressed some interest in this type of functionality. This is what I came up with. I'd greatly appreciate your input as well.

This ticket was mentioned in Slack in #core by apedog. View the logs.


4 years ago

This ticket was mentioned in Slack in #core by david.baumwald. View the logs.


4 years ago

Note: See TracTickets for help on using tickets.