Make WordPress Core

Opened 17 years ago

Closed 17 years ago

#4554 closed defect (bug) (fixed)

Canonical trailing slashes

Reported by: ryan's profile ryan Owned by: markjaquith's profile markjaquith
Milestone: 2.3 Priority: normal
Severity: normal Version:
Component: Administration Keywords:
Focuses: Cc:

Description

A post shouldn't be served via both /2007/06/foo and /2007/06/foo/. One should redirect to the other with the user's trailing slash preference as specified in the permalink structure being the canonical form.

Attachments (3)

canonical_trailing_slash.diff (1.0 KB) - added by ryan 17 years ago.
canonical.php (4.3 KB) - added by markjaquith 17 years ago.
canonical_rehash.001.diff (6.7 KB) - added by markjaquith 17 years ago.

Download all attachments as: .zip

Change History (21)

#1 @markjaquith
17 years ago

canonical.php is some code Matt and I threw back and forth a few times in March. Handles a whole lot of situations including trailing slashes, yes-www/no-www, and alternative permalink styles (i.e. ?p=N when using pretty permalinks).

#2 @markjaquith
17 years ago

  • Milestone set to 2.3 (trunk)

Oooh, and it also handles truncated URLs. Like if in an e-mail something got changed to:

http://example.com/2007/05/03/title-of-
my-awesome-post/

Clicking on http://example.com/2007/05/03/title-of- should forward you.

#3 @matt
17 years ago

I remember that code! Let's get this in.

#4 @ryan
17 years ago

We use that code on wpcom and it breaks plugins that add new links. I have to turn it off for several clients. That's why my patch is more limited. The link guessing when there's a 404 can be added back in. The stuff that builds a link from the query vars can probably be added back if we do it only when ! $wp->did_permalink. It's still pretty iffy though since it doesn't know what query vars have been added via plugin.

#5 @ryan
17 years ago

For example, one client wanted links of the form /2007/week24/. The maps to a query of index.php?year=2007&w=24. The permalink redirector changes this to /2007/ since it doesn't know about "w". Since the link is already a permalink, simply not redirecting it will fix this situation. Mapping a request of index.php?year=2007&w=24 to a permalink will still break, however.

#6 @markjaquith
17 years ago

How about disabling the redirection when we don't recognize all the query vars?

#7 @Viper007Bond
17 years ago

Or having a way for plugins to register query vars and such so that it redirects properly, assuming there isn't one already.

#8 @markjaquith
17 years ago

There is such a way... but the problem is that there are query vars we know about but don't know how to translate into a permalink. Like "w". And it's hard to tell whether a query var is default or was overridden. Big mess for little payoff.

New approach: look for known invalid alternative permalinks like ?p=N and redirect those.

We can still do trailing slash, no-www/yes-www, index.php$ stripping and 404-guessing, which will be the majority of the issues.

#9 @snakefoot
17 years ago

Instead of just validating that the URL is using the correct format, then it would be nice if it validated that the url was the official permalink:

http://wordpress.org/extend/plugins/permalink-validator/

#10 @markjaquith
17 years ago

There are negative side effects to such an approach... see the comments above.

#11 @snakefoot
17 years ago

If the there was canonical urls for posts, pages and categories, then it would be a big step. Don't mind this issue being solved in multiple steps, and taking the easy ones first instead of trying to solve everything in one step.

#12 @markjaquith
17 years ago

  • Owner changed from anonymous to markjaquith
  • Status changed from new to assigned

canonical_rehash.001.diff is pretty much a rewrite.

The basic idea here is to look for known issues. It doesn't try to be the end-all. It fixes what it can, and doesn't bite off more than it can chew.


Features:

  1. Trailing slashes (enforces user preference)
  2. example.com vs www.example.com (enforces user preference)
  3. strips trailing /index.php or /index.php/
  4. strips /index.php/ if using mod_rewrite permalinks, which allows for seamless migration from PATHINFO to mod_rewrite
  5. Redirects ?p=N, ?page_id=N, ?m=YYYY, ?m=YYYYMM, ?m=YYYYMMDD, ?year=YYYY, ?year=YYYY&monthnum=MM, ?year=YYYY&monthnum=MM&day=DD, ?cat=N, ?author=N, ?paged=N to their "pretty" counterparts.
  6. Always preserves additional args in the query string.
  7. Redirects is_single() URLs with partial slugs. e.g. /2007/06/05/slug-fragment => /2007/06/05/slug-fragment-is-whole-now/
  8. Is non-greedy -- looks for specific cases that it can fix, but doesn't try to fix everything.

I've used dozens of test cases, but I probably forgot a few along the way. Let me know what you think.

#13 @snakefoot
17 years ago

Since this patch expects that REQUEST_URI is available and correct, then it is even more important that #3514 is solved or else it will redirect constantly on IIS.

#14 @markjaquith
17 years ago

Solved #3514 in [5889], so we should be good to go, here.

#15 @markjaquith
17 years ago

(In [5890]) Canonical URLs, first swing. Props to Scott Yang, Ryan and Matt. see #4554 (and report bugs there, for now)

#16 @markjaquith
17 years ago

(In [5891]) Oops... forgot the svn add. Thanks mdawaffe. see #4554

#17 @ryan
17 years ago

Calling this done. Bugs can be handled in other tickets.

#18 @ryan
17 years ago

  • Resolution set to fixed
  • Status changed from assigned to closed
Note: See TracTickets for help on using tickets.