Make WordPress Core

Opened 8 years ago

Closed 3 years ago

#41211 closed defect (bug) (duplicate)

When the /category/category-name portion is repeated in the URL, it serves content instead of 404

Reported by: sjwright1986's profile sjwright1986 Owned by:
Milestone: Priority: normal
Severity: normal Version: 4.8
Component: Rewrite Rules Keywords: has-patch
Focuses: Cc:

Description

Scenario
We have two urls:

The second URL above has the '/category/category-name/' portion of the URL repeated.

Expected outcome

Actual outcome

Change History (4)

#1 @bradleyt
3 years ago

  • Keywords needs-patch added

I can reproduce this. It's worth noting that this has SEO implications as WordPress does not output canonical meta tags on categories by default (rel_canonical only runs on singular posts and pages).

Digging deeper it seems that any hierarchical categories are affected (including custom taxonomies registered with register_taxonomy()). Non-heirachical taxonomies, such as tags, do not seem to be affected.

In class-wp-query.php I've found the following code:

if ( ! empty( $t->rewrite['hierarchical'] ) ) {
    $q[ $t->query_var ] = wp_basename( $q[ $t->query_var ] );
}

I'm not 100% sure, but it looks like WordPress may be extracting the taxonomy type from the URL, extracting the last URL section, and then ignoring everything in between.

For pages, this situation is avoided by checking for get_page_by_path in parse_request in class-wp.php. Heirachical taxonomies could be checked in the same way. Alternatively, the handle_404 method in class-wp.php could be extended to check the URL sections match the found taxonomies parent.

#2 @enshrined
3 years ago

This issue also exists when you use a custom permalink structure such as: /%category%/%postname%/. I'm using category as an example here, but this would happen for all hierarchical taxonomies.

Due to how the rewrites are added, when %category% appears first in the permalink structure, we get the following rule added category/(.+?)/?$. Due to the openness of the regex used for hierarchical taxonomies, it will match anything between the /category and the end of the URL.

For example, set up a category with the slug testing. It should only be accessible at:

  • /category/testing/

It's actually accessible at /category/(.+?)/testing. In practice, this could look like any of the following:

  • /category/asdf/testing
  • /category/asdf/1234/testing
  • /category/asdf/1234/xy-z/testing

I agree with @bradleyt here; taxonomies should have a function like get_page_by_path() that can double-check that the path in use correctly matches that of a term.

There's a patch incoming that should handle this.

This ticket was mentioned in PR #2344 on WordPress/wordpress-develop by darylldoyle.


3 years ago
#3

  • Keywords has-patch added; needs-patch removed

This PR adds a new get_term_by_path() function that works similarly to get_page_by_path(). It's then used to stop hierarchical taxonomies from returning terms on paths that should instead 404.

Trac ticket: https://core.trac.wordpress.org/ticket/41211

#4 @SergeyBiryukov
3 years ago

  • Milestone Awaiting Review deleted
  • Resolution set to duplicate
  • Status changed from new to closed

Hi there, welcome to WordPress Trac!

Thanks for the report, we're already tracking this issue in #18734.

@enshrined Thanks for the PR! Could you move it to the other ticket, to keep the discussion in one place? Thanks again!

Note: See TracTickets for help on using tickets.