WordPress.org

Make WordPress Core

Opened 7 years ago

Closed 7 years ago

#4554 closed defect (bug) (fixed)

Canonical trailing slashes

Reported by: ryan Owned by: markjaquith
Milestone: 2.3 Priority: normal
Severity: normal Version:
Component: Administration Keywords:
Focuses: Cc:

Description

A post shouldn't be served via both /2007/06/foo and /2007/06/foo/. One should redirect to the other with the user's trailing slash preference as specified in the permalink structure being the canonical form.

Attachments (3)

canonical_trailing_slash.diff (1.0 KB) - added by ryan 7 years ago.
canonical.php (4.3 KB) - added by markjaquith 7 years ago.
canonical_rehash.001.diff (6.7 KB) - added by markjaquith 7 years ago.

Download all attachments as: .zip

Change History (21)

markjaquith7 years ago

comment:1 markjaquith7 years ago

canonical.php is some code Matt and I threw back and forth a few times in March. Handles a whole lot of situations including trailing slashes, yes-www/no-www, and alternative permalink styles (i.e. ?p=N when using pretty permalinks).

comment:2 markjaquith7 years ago

  • Milestone set to 2.3 (trunk)

Oooh, and it also handles truncated URLs. Like if in an e-mail something got changed to:

http://example.com/2007/05/03/title-of-
my-awesome-post/

Clicking on http://example.com/2007/05/03/title-of- should forward you.

comment:3 matt7 years ago

I remember that code! Let's get this in.

comment:4 ryan7 years ago

We use that code on wpcom and it breaks plugins that add new links. I have to turn it off for several clients. That's why my patch is more limited. The link guessing when there's a 404 can be added back in. The stuff that builds a link from the query vars can probably be added back if we do it only when ! $wp->did_permalink. It's still pretty iffy though since it doesn't know what query vars have been added via plugin.

comment:5 ryan7 years ago

For example, one client wanted links of the form /2007/week24/. The maps to a query of index.php?year=2007&w=24. The permalink redirector changes this to /2007/ since it doesn't know about "w". Since the link is already a permalink, simply not redirecting it will fix this situation. Mapping a request of index.php?year=2007&w=24 to a permalink will still break, however.

comment:6 markjaquith7 years ago

How about disabling the redirection when we don't recognize all the query vars?

comment:7 Viper007Bond7 years ago

Or having a way for plugins to register query vars and such so that it redirects properly, assuming there isn't one already.

comment:8 markjaquith7 years ago

There is such a way... but the problem is that there are query vars we know about but don't know how to translate into a permalink. Like "w". And it's hard to tell whether a query var is default or was overridden. Big mess for little payoff.

New approach: look for known invalid alternative permalinks like ?p=N and redirect those.

We can still do trailing slash, no-www/yes-www, index.php$ stripping and 404-guessing, which will be the majority of the issues.

comment:9 snakefoot7 years ago

Instead of just validating that the URL is using the correct format, then it would be nice if it validated that the url was the official permalink:

http://wordpress.org/extend/plugins/permalink-validator/

comment:10 markjaquith7 years ago

There are negative side effects to such an approach... see the comments above.

comment:11 snakefoot7 years ago

If the there was canonical urls for posts, pages and categories, then it would be a big step. Don't mind this issue being solved in multiple steps, and taking the easy ones first instead of trying to solve everything in one step.

comment:12 markjaquith7 years ago

  • Owner changed from anonymous to markjaquith
  • Status changed from new to assigned

canonical_rehash.001.diff is pretty much a rewrite.

The basic idea here is to look for known issues. It doesn't try to be the end-all. It fixes what it can, and doesn't bite off more than it can chew.


Features:

  1. Trailing slashes (enforces user preference)
  2. example.com vs www.example.com (enforces user preference)
  3. strips trailing /index.php or /index.php/
  4. strips /index.php/ if using mod_rewrite permalinks, which allows for seamless migration from PATHINFO to mod_rewrite
  5. Redirects ?p=N, ?page_id=N, ?m=YYYY, ?m=YYYYMM, ?m=YYYYMMDD, ?year=YYYY, ?year=YYYY&monthnum=MM, ?year=YYYY&monthnum=MM&day=DD, ?cat=N, ?author=N, ?paged=N to their "pretty" counterparts.
  6. Always preserves additional args in the query string.
  7. Redirects is_single() URLs with partial slugs. e.g. /2007/06/05/slug-fragment => /2007/06/05/slug-fragment-is-whole-now/
  8. Is non-greedy -- looks for specific cases that it can fix, but doesn't try to fix everything.

I've used dozens of test cases, but I probably forgot a few along the way. Let me know what you think.

comment:13 snakefoot7 years ago

Since this patch expects that REQUEST_URI is available and correct, then it is even more important that #3514 is solved or else it will redirect constantly on IIS.

comment:14 markjaquith7 years ago

Solved #3514 in [5889], so we should be good to go, here.

comment:15 markjaquith7 years ago

(In [5890]) Canonical URLs, first swing. Props to Scott Yang, Ryan and Matt. see #4554 (and report bugs there, for now)

comment:16 markjaquith7 years ago

(In [5891]) Oops... forgot the svn add. Thanks mdawaffe. see #4554

comment:17 ryan7 years ago

Calling this done. Bugs can be handled in other tickets.

comment:18 ryan7 years ago

  • Resolution set to fixed
  • Status changed from assigned to closed
Note: See TracTickets for help on using tickets.