Make WordPress Core

Opened 14 years ago

Last modified 5 years ago

#14201 new defect (bug)

Canonical redirect kicks in in case of category/tag base containing other chars then a-z, 0-9, _ and -

Reported by: nbachiyski's profile nbachiyski Owned by:
Milestone: Priority: normal
Severity: normal Version:
Component: Canonical Keywords: needs-patch close needs-testing
Focuses: Cc:

Description

Expected behaviour

Whatever the category base is, if we go to a properly formed category pretty permalink, canonical redirects shouldn't kick in.

Actual behaviour

If the category base is non-ASCII (for example: баба), the canonical redirect tries to redirect to the same URL. The redirect sanitizer removes the category base from the URL, because it is non-ASCII and redirects to <root>//category-name/. This prevents endless redirects and usually results in 404.

Why does it happen?

Category base is always used verbatim. It can't be URL-encoded, because the percent signs will be interpreted as permalink variables. Because of that the generated urls will be always in the form: <root>/баба/<url-encoded-category-name>/.

The contents of $_SERVER['REQUEST_URI'] are always URL-encoded, so the requested URI is: <root>/%D0%B1%D0%B0%D0%B1%D0%B0/<url-encoded-category-name>/.

Canonical redirect functionality assumes the requested URL would be the same as the generated term URL and since they are different tries to redirect.

Solutions

The easiest one is to assume that if we had come to the right category page without any get variables, we don't need the logic for redirecting to the canonical category page. This is valid statement, because that logic relies only on removing get arguments.

The only disadvantage with that solution is that doesn't solve the more general problem of discrepancies between generated and requested URLs. But for now it will do a good job.

Attachments (3)

no-ascii-category-base.diff (756 bytes) - added by nbachiyski 14 years ago.
14201.patch (590 bytes) - added by hakre 14 years ago.
URLs need encoding (see RFC)
14201.2.patch (1.7 KB) - added by hakre 14 years ago.
Don't interfere with Permalink Generation

Download all attachments as: .zip

Change History (27)

#1 @scribu
14 years ago

  • Keywords has-patch added

#2 @nacin
14 years ago

Out of curiosity, this a regression?

#3 @nbachiyski
14 years ago

The taxonomy redirect logic was added after 2.9 was released. Just nobody had tested with non-ASCII base.

#4 @Lafirel
14 years ago

Fix my wp-includes/canonical.php via the attachment. Wait and see if the tag permalink with Chinese or Japanese (non-ASCII) still have the redirect errors in Google webmaster tools.

#5 @Lafirel
14 years ago

It seems that the spiders and visitors are still get 301.
"/tag/%E6%B8%B8%E6%88%8F%E9%9F%B3%E6%95%88 Http Code: 301"
If just nobody tested the logic or something else add to 3.0, I am wondering if anybody had tested the patch with non-ASCII base?

#6 @ryan
14 years ago

See also [13480] and #14292

#7 @hakre
14 years ago

I was able to reproduce the issue:

  1. Set category base to "баба" in the "Category base"-textbox in Admin -> Permalinks Settings.
  2. Visiting a blog post with the "Uncategorized" category.
  3. Clicking on that category link (http://host/wordpress-trunk/%D0%B1%D0%B0%D0%B1%D0%B0/uncategorized/).
  4. HTTP Client navigates to http://host/wordpress-trunk/uncategorized/ displaying a 404 / Not Found Page (HTTP Status code is 404).


#8 @hakre
14 years ago

Request URI is 'REQUEST_URI' => string '/wordpress-trunk/%D0%B1%D0%B0%D0%B1%D0%B0/uncategorized/' and there is a match:

  object(WP) ...
  public 'request' => string 'баба/uncategorized' (length=22)
  public 'matched_rule' => string 'баба/(.+?)/?$' (length=17)
  public 'matched_query' => string 'category_name=uncategorized' (length=27)
  public 'did_permalink' => boolean true

Maybe this is not an input / option store issue.

#9 @hakre
14 years ago

the template_redirect hook in template-loader.php:7 calls redirect_canonical() (wp-includes\canonical.php:160) and no parameters which results in a different parsing of $_SERVER['REQUEST_URI'] then in WP.

This difference is the cause of error.

@hakre
14 years ago

URLs need encoding (see RFC)

#11 @hakre
14 years ago

Found the place where the invalid chars were inserted into the URL. Fixed in patch, please test.

#12 @hakre
14 years ago

  • Summary changed from Canonical redirect kicks in in case of non-ASCII category/tag base to Canonical redirect kicks in in case of category/tag base containing other chars then a-z, 0-9, _ and -

The following ASCII Category base does not work as well: %D0%B1%D0%B0%D0%B1%D0%B0

So it's even not possible for users to fix the problem by providing an encoded value via the backend.

@hakre
14 years ago

Don't interfere with Permalink Generation

#13 @hakre
14 years ago

My first patch did interfere with rewrites-generation which did not preserve the benefit afer rebuilding the permalink structure. Second patch is now one level over that layer. Fixes both, the category and the tag base.

#14 @hakre
14 years ago

In the end this is basically the same route as the reporter nbachiyski already wrote [reproduceable, reviewed], but differs in the solution.

Next to the reported broken request-ability of any category/tag, the following functions are broken:

  1. wp_xmlrpc_server::wp_getTags() - invalid URLs
  2. wp_xmlrpc_server::mw_getCategories() - invalid URLs
  3. category-template.php() - invalid URLs
  4. get_the_category_list() - invalid URLs
  5. get_term_link() - invalid URLs
  6. get_the_term_list() - invalid URLs
  7. wp_tag_cloud() - invalid URLs
  8. get_term_feed_link() - invalid URLs
  9. wp_setup_nav_menu_item() - invalid URLs
  10. get_the_taxonomies() - invalid URLs
  11. Walker::start_el() - invalid URLs

WP_Rewrite::$extra_permastructs is something else probably worth to deal with (in output).


#15 @ryan
14 years ago

Use Nikolay's patch for 3.0.1 and pursue fixing all of the places noted by hakre for 3.1?

#16 @nbachiyski
14 years ago

I think hakre's is the right approach, but we need to get the code checked-in very early in the cycle, so that people can test. I have a couple of people, who reported it and they can test in almost production environment.

#17 @ryan
14 years ago

nbachiyski, you're okay with your patch as a stopgap for 3.0.1 then?

#18 @ryan
14 years ago

(In [15462]) Temp fix for canonical redirects of taxonomy links containing non-ASCII bases. Props nbachiyski. see #14201

#19 @ryan
14 years ago

(In [15463]) Temp fix for canonical redirects of taxonomy links containing non-ASCII bases. Props nbachiyski. see #14201 for 3.0.1

#20 @nacin
14 years ago

  • Keywords early added
  • Milestone changed from 3.0.1 to 3.1

#21 @nacin
13 years ago

  • Keywords 3.2-early added
  • Milestone changed from 3.1 to Future Release

#22 @ocean90
12 years ago

  • Component changed from General to Canonical
  • Keywords needs-patch added; has-patch early 3.2-early removed

Related: #18734

#23 @andrewroberts
10 years ago

I can't seem to reproduce this using 3.8-beta-1-26375.

#24 @chriscct7
8 years ago

  • Keywords close needs-testing added
  • Priority changed from high to normal
  • Severity changed from major to normal

Can't reproduce on 4.3.1, what remains on this ticket after the commits to fix?

Note: See TracTickets for help on using tickets.