Make WordPress Core

Opened 13 years ago

Last modified 5 years ago

#18660 assigned enhancement

Enhance rel_canonical function, add filter

Reported by: nathanrice's profile nathanrice Owned by: joostdevalk's profile joostdevalk
Milestone: Future Release Priority: normal
Severity: normal Version: 3.3
Component: Canonical Keywords: has-patch needs-testing SEO
Focuses: Cc:

Description

I think it's a bit shortsighted to think that only singular pages need the canonical tag output in the <head>. Considering the fact that just about any page on your site can be accessed with a malformed URL, I think it's time to enhance this function.

The attached patch is just a first pass. But I think it gets us started in the right direction. There's also a filter before output, so themes and plugins can further enhance the output of this plugin (related #14458).

Patch is against [438126].

Attachments (11)

link-template.php.diff (2.2 KB) - added by nathanrice 13 years ago.
Enhance the rel_canonical() function
post-template.php.diff (516 bytes) - added by nathanrice 13 years ago.
link-template.php.2.diff (1.3 KB) - added by nathanrice 13 years ago.
canonical.patch (2.1 KB) - added by joostdevalk 13 years ago.
Canonical Patch
canonical.2.patch (2.1 KB) - added by joostdevalk 13 years ago.
Canonical Patch v2
canonical.3.patch (2.1 KB) - added by joostdevalk 13 years ago.
Canonical Patch v3
canonical.4.patch (2.3 KB) - added by joostdevalk 13 years ago.
Canonical Patch v4
canonical.5.patch (2.4 KB) - added by joostdevalk 13 years ago.
Canonical Patch v5
18660.diff (2.6 KB) - added by markjaquith 13 years ago.
trunk.diff (9.2 KB) - added by joostdevalk 13 years ago.
New approach, fixes several tickets at once.
canonical.6.patch (9.8 KB) - added by joostdevalk 13 years ago.
Canonical Patch v7

Download all attachments as: .zip

Change History (48)

@nathanrice
13 years ago

Enhance the rel_canonical() function

#1 follow-up: @nacin
13 years ago

Note that 438126 would be Akismet external's SVN revision number. Current trunk is [18667].

Also note something about canonical in http://yoast.com/wp-theme/genesis/, published earlier this week.

#2 in reply to: ↑ 1 @nathanrice
13 years ago

Replying to nacin:

Note that 438126 would be Akismet external's SVN revision number. Current trunk is [18667].

Yes, of course. My mistake.

Replying to nacin:

Also note something about canonical in http://yoast.com/wp-theme/genesis/, published earlier this week.

For the official rel_canonical() function, we definitely shouldn't canonicalize back to page 1 of an archive. That suggestion was to benefit our users at the recommendation of a very well respected SEO expert, Greg Boser. Yoast said he and Greg will have to fight about that the next time they see each other :-)

So yes, my patch as is won't work. But it's a start. When I get some time, I'll updated the canonical generation to account for paging.

#3 follow-up: @scribu
13 years ago

Why is this needed, when we also have canonical redirects?

#4 in reply to: ↑ 3 @nathanrice
13 years ago

Replying to scribu:

Why is this needed, when we also have canonical redirects?

Because canonical redirects don't correct all malformed URLs. Consider something like this:

http://example.com/?referrer=wordpress.org

No redirect. The canonical tag helps search engines know that when people link to the homepage with a malformed URL, you want pagerank to pass to http://example.com/ and not the other URL (seen by Google as two different pages).

#5 @nathanrice
13 years ago

Sorry about link-template.php.diff and post-template.php.diff. Different patches for different tickets. My bad.

#6 @joostdevalk
13 years ago

  • Owner set to joostdevalk
  • Status changed from new to assigned

+1 on fixing this.

I've attached a better patch, based on the code in my WordPress SEO plugin. The pagination canonical Genesis uses is at best agressive optimization, and not based on the canonical standards, so we should indeed not include that.

This patch makes sure WordPress outputs a canonical on:

  • singular posts, pages, etc.
  • the homepage & frontpage
  • taxonomy, category & tag archives
  • post type archives
  • dated archives

It also deals with pagination, both for paged posts and pages and all paginated archives.

@joostdevalk
13 years ago

Canonical Patch

#7 @joostdevalk
13 years ago

Slight bug in first version of that patch, replaced.

#8 follow-up: @nathanrice
13 years ago

@joostdevalk
Looks excellent to me.

Maybe this is just paranoia, but since $link can be filtered, I'd like to see it escaped before output.

Also, use home_url() in lieu of get_bloginfo('url').

The only thing I don't get is the distinction between front page and home (if show_on_front is page). Seems to me, both of those would be home_url().

Otherwise, patch looks good to me.

#9 @nathanrice
13 years ago

Nevermind my home/front confusion. I think you're trying to target the blog posts page, right?

Last edited 13 years ago by nathanrice (previous) (diff)

@joostdevalk
13 years ago

Canonical Patch v2

#10 in reply to: ↑ 8 ; follow-up: @joostdevalk
13 years ago

Replying to nathanrice:

Maybe this is just paranoia, but since $link can be filtered, I'd like to see it escaped before output.

Agreed, fixed.

Also, use home_url() in lieu of get_bloginfo('url').

In fact, home_url( '/' ); to make it even shorter :)

The only thing I don't get is the distinction between front page and home (if show_on_front is page). Seems to me, both of those would be home_url().

Your second comment was correct, it's the posts page. And the logic for that is annoying indeed, we should probably have a convenience function for that too.

#11 in reply to: ↑ 10 @nathanrice
13 years ago

Replying to joostdevalk:

Your second comment was correct, it's the posts page. And the logic for that is annoying indeed, we should probably have a convenience function for that too.

Indeed. is_home() has become quite useless lately.

#12 follow-up: @nathanrice
13 years ago

Should esc_url() be used instead of esc_attr()?

@joostdevalk
13 years ago

Canonical Patch v3

#13 in reply to: ↑ 12 @joostdevalk
13 years ago

Replying to nathanrice:

Should esc_url() be used instead of esc_attr()?

Yes indeed, changed.

Also fixed check for paginated posts and pages, that threw a notice in v2.

@joostdevalk
13 years ago

Canonical Patch v4

#14 @joostdevalk
13 years ago

Thanks to feedback from johnbillionh on IRC:

  • added canonical for author archives
  • fixed canonical for paged archives when pretty permalinks are NOT enabled

@joostdevalk
13 years ago

Canonical Patch v5

#15 @joostdevalk
13 years ago

Pagination in single links fixed too now.

@markjaquith
13 years ago

#16 @markjaquith
13 years ago

Pass in the context to user_trailingslashit(), use esc_url() in the output, clean up some whitespace issues.

#17 @joostdevalk
13 years ago

When I used esc_url it output &#38; instead of &amp;, any way to prevent that?

#18 @sirzooro
13 years ago

  • Cc sirzooro added

joostdevalk,
Please take a look on my plugin, Meta SEO Pack - in particular MetaSeoPack:::get_canonical() method. It generates canonical URL for all WP pages, including multi-paged content (this is a bit tricky area). I haven't updated if for a while so some updates will be needed there (in particular use get_search_link() for search results page and add support for custom post types if needed). It should give you few hints how to improve your patch.

@joostdevalk
13 years ago

New approach, fixes several tickets at once.

#19 @joostdevalk
13 years ago

This new patch not only fixes canonical, but also implements rel=prev and rel=next, ticket #18672.

@joostdevalk
13 years ago

Canonical Patch v7

#20 @joostdevalk
13 years ago

Another pass, fixes:

  • context passed to user_trailingslashit everywhere
  • removed use of setup_postdata for a simple count
  • fixed links for paginated pages on the frontpage (they need to include /page/)
  • passed author_name to get_author_posts_url to simplify the process of getting that URL

Naming of functions probably needs to be discussed....

#21 follow-up: @sirzooro
13 years ago

You can use global variable $numpages instead calculating page count again.

#22 in reply to: ↑ 21 ; follow-up: @joostdevalk
13 years ago

Replying to sirzooro:

You can use global variable $numpages instead calculating page count again.

Actually can't because that variable hasn't been set at the point yet.

#23 in reply to: ↑ 22 @sirzooro
13 years ago

Replying to joostdevalk:

Replying to sirzooro:

You can use global variable $numpages instead calculating page count again.

Actually can't because that variable hasn't been set at the point yet.

OK, I see. So your code may need extra change when #9911 will be commited.

#24 @nathanrice
13 years ago

What's the status on this ticket? It was submitted and patched prior to the feature freeze for 3.3, but the milestone hasn't been changed. Do we need a certain number of definitive "patch works for me" messages before it can be committed?

#25 @joostdevalk
13 years ago

  • Cc joost@… added

So, let's make sure this makes it into 3.4, what needs to be done to get that going?

#26 @joostdevalk
13 years ago

Can we get this one into 3.4, pretty please? :)

#27 @nacin
13 years ago

I've applied this patch, merged it with HEAD, and gone over each line of code. And I've done this no fewer than four times in the last six weeks. I can't bring myself to commit the whole thing at this point, and for two main reasons.

  • I think adding rel=canonical to non-singular pages need some very careful consideration. get_current_archive_url() sounds like a fantastic idea in theory, but it could be damaging to a complicated site pretty easily.

I am worried about how a complicated drill-down URL (say, with multiple taxonomies, or a query string, or something else) would end up with an improper canonical link. This is perhaps the greatest issue when it comes to archive pages, and that's likely the only reason why we stuck to singular the first time around.

  • I think the pagination aspects (both rel=canonical respecting pagination, and rel=next/prev handling pagination) is much more solid. I think it has the potential to cause problems for non-singular pages, but probably not as bad as a faulty rel=canonical could.

However, I am irked that the patch avoids using any API functions for creating pagination links, and instead calculates them on their own. I understand this is of no fault of the patch — rather, our APIs around pagination links (both singular "page" links and archive "paged" links) are all over the place.

There are also a number of filters and hooks scattered about these functions, which make me question things like pagination bases and other points of customization that could break. I would like to go through these functions, potentially in 3.5, and clean them up, and make them obvious about what is going on.

The one thing that seems most safe is a single, targeted patch that adds support for singularly paginated items to rel_canonical(). And it is getting too late in the cycle to make such changes, so I would rather try to do all of this in one effort for 3.5. Sorry, yoast.

#28 @here
11 years ago

See also #11694

Default core behavior in 3.5.1 causes multipage posts using <!-- nextpage --> to have subsequent post pages ignored / deindexed by google due to link rel=canonical pointing to first page of multipage post. Compounded by rel=next/prev pointing to next/prev chronological post not next/prev page of multipage post.

Google on Pagination
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=1663744

Google on Canonical
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=139394

Stackexchange discussion :
http://wordpress.stackexchange.com/questions/84647/multi-page-posts-do-not-get-indexed-by-google-due-to-canonical-urls

works-for-me 3.5.1 :

Add post page numbers greater than 1 to rel=canonical on multipage posts
wp-includes/link-template.php:

function rel_canonical() {
...
    if ( $page = get_query_var('page') )                                        
        $link = $link . "$page/";

Remove next/prev links from all pages in header (not ideal)
functions.php:

// remove rel=next/prev links in header                                         
remove_action('wp_head', 'adjacent_posts_rel_link_wp_head');                            

Agree all related canonical issues should be handled elegantly as proposed , including filter.

Last edited 11 years ago by here (previous) (diff)

#29 @joostdevalk
11 years ago

Happy to work on this patch more if someone (@nacin?) is willing to bless the task and provide me with support on how to fix this. I'd love for us to finally dig it all out and fix all the related nonsense old code in core...

#31 @corvannoorloos
11 years ago

  • Cc cor@… added

#32 @ocean90
11 years ago

#14458 was marked as a duplicate.

#33 @jrf
10 years ago

Wondering what the status is of this ticket.

At the very least both rel_canonical() as well as get_adjacent_post_rel_link() need an esc_url() around the link as it could contain &'s.

If it would help, I'll gladly send in a separate patch to at least fix that.

#34 @joostdevalk
7 years ago

  • Keywords SEO added

Having a new go at this patch at the moment, extracting all the canonical code from Yoast SEO.

#35 @westonruter
7 years ago

  • Keywords 2nd-opinion removed
  • Milestone changed from Awaiting Review to Future Release

I have a concern about get_current_archive_link() which appears in canonical.6.patch, namely that it doesn't account for custom rewrite rules added by plugins. This is also a problem for wp_get_canonical_url() actually in regards to custom rewrite endpoints (anything other then page is stripped).

What about an algorithm like the following to determine the default canonical URL for a given request:

<?php
$added_query_vars = $wp->query_vars;
if ( ! $wp_rewrite->permalink_structure || empty( $wp->request ) ) {
        $url = home_url( '/' );
} else {
        $url = home_url( user_trailingslashit( $wp->request ) );
        parse_str( $wp->matched_query, $matched_query_vars );
        foreach ( $wp->query_vars as $key => $value ) {

                // Remove query vars that were matched in the rewrite rules for the request.
                if ( isset( $matched_query_vars[ $key ] ) ) {
                        unset( $added_query_vars[ $key ] );
                }
        }
}

This ensures that custom rewrite rules and endpoints are honored, as well as all public query vars.

Note: I'm looking into this for the sake of adding canonical support to the AMP plugin, as the AMP spec requires a rel=canonical link on every AMP response, even if it points to itself.

Last edited 7 years ago by westonruter (previous) (diff)

#36 @joostdevalk
7 years ago

I have serious doubs as to whether your simple algorithm would work. The code generating the canonical in Yoast SEO is significantly longer ;)

#37 @jonoaldersonwp
5 years ago

@joostdevalk Can you resurrect this?

Note: See TracTickets for help on using tickets.