WordPress.org

Make WordPress Core

Opened 5 years ago

Closed 4 years ago

Last modified 4 years ago

#8948 closed defect (bug) (fixed)

permalink redirection is too liberal

Reported by: mrmist Owned by: markjaquith
Milestone: 3.0 Priority: normal
Severity: normal Version: 2.7
Component: Canonical Keywords: has-patch featured
Focuses: Cc:

Description

As described here -

http://wordpress.org/support/topic/237126?replies=2

If I visit

www.example.com/t

It'll happily redirect me to (something like)

www.example.com/2002/06/11/taadaa

And so on.

(The above assumes date & name permalinks, but similar stuff happens with custom permalinks like category/article/ when it'll even pick a random category for you.)

This should really be returning 404s.

Attachments (1)

multiple-terms.diff (1.3 KB) - added by scribu 4 years ago.
Don't redirect taxonomy unions or intersections

Download all attachments as: .zip

Change History (56)

comment:1 Viper007Bond5 years ago

  • Component changed from General to Canonical
  • Owner changed from anonymous to markjaquith

Perhaps, but the thinking was if a URL got cut off, it should guess it.

We'll take your example: http://www.example.com/2002/06/11/taadaa/

If that got cut off in an e-mail, IM, whatever and was http://www.example.com/2002/06/11/taa, WordPress would be smart enough to redirect to the proper post.

comment:2 janeforshort5 years ago

  • Milestone changed from 2.8 to Future Release

Punting to be evaluated in next development cycle due to time constraints.

comment:3 follow-up: Denis-de-Bernardy5 years ago

wontfix?

comment:4 Denis-de-Bernardy5 years ago

  • Milestone changed from Future Release to 2.9

comment:5 in reply to: ↑ 3 mrmist5 years ago

  • Priority changed from normal to low

Replying to Denis-de-Bernardy:

wontfix?

It's hardly the biggest deal in the world, but to explain the rational behind it, I was thinking that in a piece of software that really tries to give one URL to one thing, this sticks out as doing just the opposite - it allows many things to redirect to one place.

Viper's cited case is perhaps understandable, but in no way can I understand the idea behind redirecting

/foo/baa/

To

/news/barry-white/

Which does happen, and is just random.

comment:6 matt5 years ago

I think it would be okay to require like 3 characters in the path for canonical to kick in.

comment:7 miqrogroove4 years ago

  • Cc miqrogroove@… added

Sorry to pile on here guys, but you need to think bigger. In taxonomy handling the exact opposite problem exists:

/category/subcat2/ is treated as identical to /category/cat1/subcat2/ with no forwarding.

Whatever strategy is adopted, it should be applied globally instead of piecemeal.

comment:8 ryan4 years ago

  • Milestone changed from 2.9 to Future Release

comment:9 mrmist4 years ago

  • Priority changed from low to normal

At least it's not just me finding this odd.

http://wordpress.org/support/topic/343508?replies=3

comment:10 Denis-de-Bernardy4 years ago

if the issue truly is about emails, how about this approach?

a) if it doesn't look like a singular, then bail

b) if it's a singular page, then check the length of the url. if it's not 70-80 chars long then it was probably wasn't cut in an email, so bail.

c) we're sure we've a singular whose url was cut. we try to find the correct one using chosen bits from our current algorithm

comment:11 Denis-de-Bernardy4 years ago

  • Keywords 2nd-opinion added

or more simply just a) and b)

comment:12 Denis-de-Bernardy4 years ago

err, a) and c) even. :-)

comment:13 miqrogroove4 years ago

I agree with mrmist about the random redirects. Make a better 404 page that says, "maybe you were looking for .. <link> " That's what Google recommends to do anyway. And then stop treating categories as non-hierarchical!

comment:14 mrmist4 years ago

+1 to most of that. I particularly like the idea of a "Maybe you were looking for" type page.

Really I just don't want a completely random / short selection of characters to redirect to a post URL. I can see the point about cut-off email URLs, but in that case I think that DDBs logic should apply.

comment:15 follow-up: hakre4 years ago

+1 for taking care. If we led the SEO-Kids know about this feature, that would start a hell of a fire. Redirect poisioning of WP based sites, outch.

comment:16 hakre4 years ago

Just tested if there is a problem with trashed posts ( #11236 ) which is not the case.

comment:17 miqrogroove4 years ago

Found another forwarding problem in attachments.

/yyyy/mm/slug1/attach1/ is treated as identical to /yyyy/mm/ablabllskjfsakd/attach1/

comment:18 miqrogroove4 years ago

/2010/01testing-123/ forwarded to /2009/12/testing/ for no apparent reason. The only reasonable replacement in this situation would be /2010/01/testing-123/

comment:19 miqrogroove4 years ago

/2010/01/testing-123/ is treated as identical to /2010/01/testing-123/

comment:20 miqrogroove4 years ago

/2010/01/slug1/&ct=ga&cd=XmAHzoevp-g&usg=AFQjCNGzZI7HVL_W48417QI_E5dOtXDWJQ/ is treated as identical to /2010/01/

This one brought to you by some random URL spammer.

comment:21 miqrogroove4 years ago

... or perhaps not so random. The previous request appears to be an invalid representation of the Google Image Search imageref query param, canonical slashed, and then re-mis-interpreted by WordPress.

If the canonical redirection can't handle that type of request correctly, then it needs to be met with a 404 response.

comment:22 miqrogroove4 years ago

/yyyy/mm/slug1/slug2/ redirects to /yyyy/mm/slug2/ I think this is the opposite of what is supposed to happen in canonicalization, since the canonical URL for /yyyy/mm/slug1/garbage* should always be /yyyy/mm/slug1/

comment:23 in reply to: ↑ 15 Otto424 years ago

Replying to hakre:

+1 for taking care. If we led the SEO-Kids know about this feature, that would start a hell of a fire. Redirect poisioning of WP based sites, outch.

It's worth noting that these redirects that are created by canonical redirection are 301s, so the redirection is permanent.

Fixing SEO problems was the whole point of this sort of thing. A 301 redirect from a URL is treated as the redirected, final, URL by search engines. Not as the original one.

So somebody having a cut-off URL posted not only goes to the right place, but also makes Google take notice and not count that cut-off as a separate URL.

Whether the code is redirecting unnecessarily or not is up for debate, but SEO-problems are not a valid reason to change this, because SEO-problems are the reason it was added in the first place.

comment:24 miqrogroove4 years ago

/categoryslug1/ is treated as identical to /category/slug1/

This one is not just theoretical. These are getting Googled in the wild. :(

comment:25 miqrogroove4 years ago

Same problem at /tagslug1/

I realize the [/]* regexp is impossible for hierarchical URLs in rewrite, so these taxonomy issues probably have to be fixed in the canonical system.

comment:26 miqrogroove4 years ago

  • Keywords featured added; 2nd-opinion removed

I don't think there is any need for 2nd-opinion on this ticket. There are obvious and severe problems with URL duplication. These problems are not handled correctly by any existing feature, including rel=canonical, canonical redirect, and .htaccess.

comment:27 follow-up: miqrogroove4 years ago

/category/uncategorized/&oogabooga/ is treated as identical to /category/uncategorized/

Again, rel=canonical is not showing up and there is no redirection.

comment:28 in reply to: ↑ 27 ; follow-ups: Viper007Bond4 years ago

Replying to miqrogroove:

/category/uncategorized/&oogabooga/ is treated as identical to /category/uncategorized/

Again, rel=canonical is not showing up and there is no redirection.

This is actually correct because you are viewing /category/uncategorized/ but with $_GET['oogabooga/'] set I believe.

comment:29 in reply to: ↑ 28 Viper007Bond4 years ago

Replying to Viper007Bond:

Replying to miqrogroove:

/category/uncategorized/&oogabooga/ is treated as identical to /category/uncategorized/

Again, rel=canonical is not showing up and there is no redirection.

This is actually correct because you are viewing /category/uncategorized/ but with $_GET['oogabooga/'] set I believe.

It's no different from visiting /category/uncategorized/?foo=bar which shouldn't redirect either. Who knows what a plugin is making that page do -- nothing or maybe display totally different content.

comment:30 in reply to: ↑ 28 ; follow-up: miqrogroove4 years ago

Replying to Viper007Bond:

This is actually correct because you are viewing /category/uncategorized/ but with $_GET['oogabooga/'] set I believe.

Heh, that would be incredibly unfortunate if it's true. I know PHP would never do that. You're saying WordPress adds extra values to super global variables?

comment:31 miqrogroove4 years ago

I added var_dump($_GET); to the default single.php template to test Viper007Bond's claim that ampersands are arbitrarily parsed into query variables.

First I tried hitting /2010/01/testing-123/&oogabooga/ and that path was treated as identical to the front page o_O The single.php template didn't even load, so that was a total failure.

Next, I tried hitting /2010/01/testing-123/&oogabooga/page/1/ and that path was forwarded to /2009/12/testing/ for no apparent reason. Another canonical failure.

Thirdly, I added var_dump($_GET); to the index.php template and tried hitting /&oogabooga/ which was also treated as indentical to the front page. The output was as expected array(0) { }

comment:32 dd324 years ago

Thirdly, I added var_dump($_GET); to the index.php template and tried hitting /&oogabooga/ which was also treated as indentical to the front page. The output was as expected array(0) { }

I've not seen many servers which are setup to treat ?& as the query seperator. Most browsers will encode the & behind the scenes.

/category/<cat>/ExtraUselessDataThatsNotASubCat/ should be returning a 404, As it currently is.

/category/<cat>/&ExtraUselessDataThatsNotASubCat/ seems to handle the & incorrectly.

/category/<cat>/ seems to handle the slashes incorrectly, That should be redirecting too i believe.

comment:33 dd324 years ago

(In [13071]) Do not poison query vars with /&foo=bar/ in requested URL. See #8949

comment:34 automattor4 years ago

(In [13072]) Strip out multiple slashes on non-post URL's. See #8948

comment:35 dd324 years ago

/category/<cat>/&ExtraUselessDataThatsNotASubCat/ seems to handle the & incorrectly.

Fixed, The new eval() replacement query parser was creating category_name=cat-name-here/&foo=bar&post_name=blah, and parsing that, Whereas, It should've been treating &foo=bar as part of the category name. Fixed in [13071]

/category/<cat>/ seems to handle the slashes incorrectly, That should be redirecting too i believe.

That doesnt happen for posts, as we generate the posts permalink and redirect if its different, [13072] strips multiple slashes out of the URL path if they're present, that should solve that problem.

/yyyy/mm/slug1/slug2/ redirects to /yyyy/mm/slug2/ I think this is the opposite of what is supposed to happen in canonicalization, since the canonical URL for /yyyy/mm/slug1/garbage* should always be /yyyy/mm/slug1/

Remains.

comment:36 in reply to: ↑ 30 Viper007Bond4 years ago

Replying to miqrogroove:

Replying to Viper007Bond:

This is actually correct because you are viewing /category/uncategorized/ but with $_GET['oogabooga/'] set I believe.

Heh, that would be incredibly unfortunate if it's true. I know PHP would never do that. You're saying WordPress adds extra values to super global variables?

Mmm, you're right -- I was incorrect. I swore that worked, but I guess I was incorrect.

comment:37 dd324 years ago

(In [13091]) Canonicalisation of the Taxonomy urls. Redirect /category/child/ to /category/parent/child/. Fix striping of slashes to use $redirect instead of $original. See #8948

comment:38 miqrogroove4 years ago

Wow, dd32 is officially my fav WP dev :D Keep up the awesome effort here! :)

btw you've got if ( strpos($original['path'] preceding the $redirect stuff. Needs consistency?

comment:39 dd324 years ago

btw you've got if ( strpos($originalpath? preceding the $redirect stuff. Needs consistency?

Ah.. Yeah forgot that one! Thanks

comment:40 follow-up: miqrogroove4 years ago

I swore that worked, but I guess I was incorrect.

I think what dd32 was getting at was the /&var=val syntax was being consumed by the WP_Query object, so it was internally equivalent to /?var=val but totally bogus at the HTTP layer.

comment:41 in reply to: ↑ 40 Viper007Bond4 years ago

Replying to miqrogroove:

I swore that worked, but I guess I was incorrect.

I think what dd32 was getting at was the /&var=val syntax was being consumed by the WP_Query object, so it was internally equivalent to /?var=val but totally bogus at the HTTP layer.

I thought file.php&foo=bar worked, but that may only be with odd server configs. :)

comment:42 miqrogroove4 years ago

[13091] Broke something, see #12245

comment:43 miqrogroove4 years ago

When WP_DEBUG is off, /?cat=99999 is forwarding to /category/ Shouldn't that be a 404?

When WP_DEBUG is on, Canonical crashes. See #12246

comment:44 miqrogroove4 years ago

btw, I have a bad feeling that /?cat=99999 falls under the jurisdiction of #11348 which is currently at -1. I would appreciate SOME agreement that status 200 and 302 are inappropriate for infinite namespaces.

comment:45 scribu4 years ago

I found a different issue:

going to ?tag=foo,bar redirects to /tags/foo

See http://ryan.boren.me/2007/10/01/taxonomy-intersections-and-unions/

scribu4 years ago

Don't redirect taxonomy unions or intersections

comment:46 scribu4 years ago

  • Keywords has-patch added; needs-patch removed
  • Milestone changed from Future Release to 3.0

multiple-terms.diff checks for tagin, tagand etc. before attempting a redirect.

comment:47 dd324 years ago

(In [13480]) Dont clobber taxonomy intersections/unions. Only redirects to the canonical url if only one term has been queried. Props scribu for some of the commit. See #8948

comment:48 miqrogroove4 years ago

Hey I've got some more for ya :D

/page1/ is treated as identical to / on 2.9 and 3.0.

/page2/ redirects to /page2/page/2/ on 3.0 only. I get a 404 in 2.9.

This is screwing up valid slugs.

comment:49 dd324 years ago

/page1/ is treated as identical to / on 2.9 and 3.0.

The WordPress paging system actually accepts /page/1/ and /page1 by design. The same goes for /page2 and /page/2

The canonical system doesnt reconise /page2 as a paging link however. 2 options are available for this situation:

  • Remove Support for /page(\d+)/? whilst retaining /page/(\d+)/?
  • Fix canonical to support the /page(\d+)/? links (An additional 2 ?'s)

'page(\d+)' are not valid slugs in WordPress when using pretty permalinks, its not possible to separate the requests.

comment:50 dd324 years ago

The canonical system doesnt reconise /page2 as a paging link however

Which is why its redirecting to /page2/page/2/

The paging rules will always have a higher priority than page rules, so it'll always be impossible to separate the requests correctly.

comment:51 miqrogroove4 years ago

Okay uh... We should probably re-open the numeric slugs ticket then, because the post editor is generating those pagen slugs.

comment:52 nacin4 years ago

(In [13717]) Prevent page(\d+) slugs, and force a suffix. props miqrogroove. fixes #11917. see #8948

comment:53 dd324 years ago

(In [13780]) Update Canonical paging to reconise /page(\d+)/? as a valid paging path. See #8948

comment:54 dd324 years ago

  • Resolution set to fixed
  • Status changed from new to closed

Closing as fixed for 3.0, Please open a new ticket for 3.1 if any furthur issues come to light.

comment:55 mrmist4 years ago

Thanks for the efforts on this one guys. Testing it out with the 3.0 beta and redirection seems to be a bit better, in so far as it doesn't appear completely random any more :)

Note: See TracTickets for help on using tickets.