WordPress.org

Make WordPress Core

Opened 3 years ago

Last modified 3 months ago

#18660 assigned enhancement

Enhance rel_canonical function, add filter

Reported by: nathanrice Owned by: joostdevalk
Milestone: Awaiting Review Priority: normal
Severity: normal Version: 3.3
Component: Canonical Keywords: has-patch needs-testing 2nd-opinion
Focuses: Cc:

Description

I think it's a bit shortsighted to think that only singular pages need the canonical tag output in the <head>. Considering the fact that just about any page on your site can be accessed with a malformed URL, I think it's time to enhance this function.

The attached patch is just a first pass. But I think it gets us started in the right direction. There's also a filter before output, so themes and plugins can further enhance the output of this plugin (related #14458).

Patch is against [438126].

Attachments (11)

link-template.php.diff (2.2 KB) - added by nathanrice 3 years ago.
Enhance the rel_canonical() function
post-template.php.diff (516 bytes) - added by nathanrice 3 years ago.
link-template.php.2.diff (1.3 KB) - added by nathanrice 3 years ago.
canonical.patch (2.1 KB) - added by joostdevalk 3 years ago.
Canonical Patch
canonical.2.patch (2.1 KB) - added by joostdevalk 3 years ago.
Canonical Patch v2
canonical.3.patch (2.1 KB) - added by joostdevalk 3 years ago.
Canonical Patch v3
canonical.4.patch (2.3 KB) - added by joostdevalk 3 years ago.
Canonical Patch v4
canonical.5.patch (2.4 KB) - added by joostdevalk 3 years ago.
Canonical Patch v5
18660.diff (2.6 KB) - added by markjaquith 3 years ago.
trunk.diff (9.2 KB) - added by joostdevalk 3 years ago.
New approach, fixes several tickets at once.
canonical.6.patch (9.8 KB) - added by joostdevalk 3 years ago.
Canonical Patch v7

Download all attachments as: .zip

Change History (43)

nathanrice3 years ago

Enhance the rel_canonical() function

comment:1 follow-up: nacin3 years ago

Note that 438126 would be Akismet external's SVN revision number. Current trunk is [18667].

Also note something about canonical in http://yoast.com/wp-theme/genesis/, published earlier this week.

comment:2 in reply to: ↑ 1 nathanrice3 years ago

Replying to nacin:

Note that 438126 would be Akismet external's SVN revision number. Current trunk is [18667].

Yes, of course. My mistake.

Replying to nacin:

Also note something about canonical in http://yoast.com/wp-theme/genesis/, published earlier this week.

For the official rel_canonical() function, we definitely shouldn't canonicalize back to page 1 of an archive. That suggestion was to benefit our users at the recommendation of a very well respected SEO expert, Greg Boser. Yoast said he and Greg will have to fight about that the next time they see each other :-)

So yes, my patch as is won't work. But it's a start. When I get some time, I'll updated the canonical generation to account for paging.

comment:3 follow-up: scribu3 years ago

Why is this needed, when we also have canonical redirects?

comment:4 in reply to: ↑ 3 nathanrice3 years ago

Replying to scribu:

Why is this needed, when we also have canonical redirects?

Because canonical redirects don't correct all malformed URLs. Consider something like this:

http://example.com/?referrer=wordpress.org

No redirect. The canonical tag helps search engines know that when people link to the homepage with a malformed URL, you want pagerank to pass to http://example.com/ and not the other URL (seen by Google as two different pages).

comment:5 nathanrice3 years ago

Sorry about link-template.php.diff and post-template.php.diff. Different patches for different tickets. My bad.

comment:6 joostdevalk3 years ago

  • Owner set to joostdevalk
  • Status changed from new to assigned

+1 on fixing this.

I've attached a better patch, based on the code in my WordPress SEO plugin. The pagination canonical Genesis uses is at best agressive optimization, and not based on the canonical standards, so we should indeed not include that.

This patch makes sure WordPress outputs a canonical on:

  • singular posts, pages, etc.
  • the homepage & frontpage
  • taxonomy, category & tag archives
  • post type archives
  • dated archives

It also deals with pagination, both for paged posts and pages and all paginated archives.

joostdevalk3 years ago

Canonical Patch

comment:7 joostdevalk3 years ago

Slight bug in first version of that patch, replaced.

comment:8 follow-up: nathanrice3 years ago

@joostdevalk
Looks excellent to me.

Maybe this is just paranoia, but since $link can be filtered, I'd like to see it escaped before output.

Also, use home_url() in lieu of get_bloginfo('url').

The only thing I don't get is the distinction between front page and home (if show_on_front is page). Seems to me, both of those would be home_url().

Otherwise, patch looks good to me.

comment:9 nathanrice3 years ago

Nevermind my home/front confusion. I think you're trying to target the blog posts page, right?

Last edited 3 years ago by nathanrice (previous) (diff)

joostdevalk3 years ago

Canonical Patch v2

comment:10 in reply to: ↑ 8 ; follow-up: joostdevalk3 years ago

Replying to nathanrice:

Maybe this is just paranoia, but since $link can be filtered, I'd like to see it escaped before output.

Agreed, fixed.

Also, use home_url() in lieu of get_bloginfo('url').

In fact, home_url( '/' ); to make it even shorter :)

The only thing I don't get is the distinction between front page and home (if show_on_front is page). Seems to me, both of those would be home_url().

Your second comment was correct, it's the posts page. And the logic for that is annoying indeed, we should probably have a convenience function for that too.

comment:11 in reply to: ↑ 10 nathanrice3 years ago

Replying to joostdevalk:

Your second comment was correct, it's the posts page. And the logic for that is annoying indeed, we should probably have a convenience function for that too.

Indeed. is_home() has become quite useless lately.

comment:12 follow-up: nathanrice3 years ago

Should esc_url() be used instead of esc_attr()?

joostdevalk3 years ago

Canonical Patch v3

comment:13 in reply to: ↑ 12 joostdevalk3 years ago

Replying to nathanrice:

Should esc_url() be used instead of esc_attr()?

Yes indeed, changed.

Also fixed check for paginated posts and pages, that threw a notice in v2.

joostdevalk3 years ago

Canonical Patch v4

comment:14 joostdevalk3 years ago

Thanks to feedback from johnbillionh on IRC:

  • added canonical for author archives
  • fixed canonical for paged archives when pretty permalinks are NOT enabled

joostdevalk3 years ago

Canonical Patch v5

comment:15 joostdevalk3 years ago

Pagination in single links fixed too now.

markjaquith3 years ago

comment:16 markjaquith3 years ago

Pass in the context to user_trailingslashit(), use esc_url() in the output, clean up some whitespace issues.

comment:17 joostdevalk3 years ago

When I used esc_url it output &#38; instead of &amp;, any way to prevent that?

comment:18 sirzooro3 years ago

  • Cc sirzooro added

joostdevalk,
Please take a look on my plugin, Meta SEO Pack - in particular MetaSeoPack:::get_canonical() method. It generates canonical URL for all WP pages, including multi-paged content (this is a bit tricky area). I haven't updated if for a while so some updates will be needed there (in particular use get_search_link() for search results page and add support for custom post types if needed). It should give you few hints how to improve your patch.

joostdevalk3 years ago

New approach, fixes several tickets at once.

comment:19 joostdevalk3 years ago

This new patch not only fixes canonical, but also implements rel=prev and rel=next, ticket #18672.

joostdevalk3 years ago

Canonical Patch v7

comment:20 joostdevalk3 years ago

Another pass, fixes:

  • context passed to user_trailingslashit everywhere
  • removed use of setup_postdata for a simple count
  • fixed links for paginated pages on the frontpage (they need to include /page/)
  • passed author_name to get_author_posts_url to simplify the process of getting that URL

Naming of functions probably needs to be discussed....

comment:21 follow-up: sirzooro3 years ago

You can use global variable $numpages instead calculating page count again.

comment:22 in reply to: ↑ 21 ; follow-up: joostdevalk3 years ago

Replying to sirzooro:

You can use global variable $numpages instead calculating page count again.

Actually can't because that variable hasn't been set at the point yet.

comment:23 in reply to: ↑ 22 sirzooro3 years ago

Replying to joostdevalk:

Replying to sirzooro:

You can use global variable $numpages instead calculating page count again.

Actually can't because that variable hasn't been set at the point yet.

OK, I see. So your code may need extra change when #9911 will be commited.

comment:24 nathanrice3 years ago

What's the status on this ticket? It was submitted and patched prior to the feature freeze for 3.3, but the milestone hasn't been changed. Do we need a certain number of definitive "patch works for me" messages before it can be committed?

comment:25 joostdevalk2 years ago

  • Cc joost@… added

So, let's make sure this makes it into 3.4, what needs to be done to get that going?

comment:26 joostdevalk2 years ago

Can we get this one into 3.4, pretty please? :)

comment:27 nacin2 years ago

I've applied this patch, merged it with HEAD, and gone over each line of code. And I've done this no fewer than four times in the last six weeks. I can't bring myself to commit the whole thing at this point, and for two main reasons.

  • I think adding rel=canonical to non-singular pages need some very careful consideration. get_current_archive_url() sounds like a fantastic idea in theory, but it could be damaging to a complicated site pretty easily.

I am worried about how a complicated drill-down URL (say, with multiple taxonomies, or a query string, or something else) would end up with an improper canonical link. This is perhaps the greatest issue when it comes to archive pages, and that's likely the only reason why we stuck to singular the first time around.

  • I think the pagination aspects (both rel=canonical respecting pagination, and rel=next/prev handling pagination) is much more solid. I think it has the potential to cause problems for non-singular pages, but probably not as bad as a faulty rel=canonical could.

However, I am irked that the patch avoids using any API functions for creating pagination links, and instead calculates them on their own. I understand this is of no fault of the patch — rather, our APIs around pagination links (both singular "page" links and archive "paged" links) are all over the place.

There are also a number of filters and hooks scattered about these functions, which make me question things like pagination bases and other points of customization that could break. I would like to go through these functions, potentially in 3.5, and clean them up, and make them obvious about what is going on.

The one thing that seems most safe is a single, targeted patch that adds support for singularly paginated items to rel_canonical(). And it is getting too late in the cycle to make such changes, so I would rather try to do all of this in one effort for 3.5. Sorry, yoast.

comment:28 here10 months ago

See also #11694

Default core behavior in 3.5.1 causes multipage posts using <!-- nextpage --> to have subsequent post pages ignored / deindexed by google due to link rel=canonical pointing to first page of multipage post. Compounded by rel=next/prev pointing to next/prev chronological post not next/prev page of multipage post.

Google on Pagination
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=1663744

Google on Canonical
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=139394

Stackexchange discussion :
http://wordpress.stackexchange.com/questions/84647/multi-page-posts-do-not-get-indexed-by-google-due-to-canonical-urls

works-for-me 3.5.1 :

Add post page numbers greater than 1 to rel=canonical on multipage posts
wp-includes/link-template.php:

function rel_canonical() {
...
    if ( $page = get_query_var('page') )                                        
        $link = $link . "$page/";

Remove next/prev links from all pages in header (not ideal)
functions.php:

// remove rel=next/prev links in header                                         
remove_action('wp_head', 'adjacent_posts_rel_link_wp_head');                            

Agree all related canonical issues should be handled elegantly as proposed , including filter.

Last edited 10 months ago by here (previous) (diff)

comment:29 joostdevalk6 months ago

Happy to work on this patch more if someone (@nacin?) is willing to bless the task and provide me with support on how to fix this. I'd love for us to finally dig it all out and fix all the related nonsense old code in core...

comment:31 corvannoorloos4 months ago

  • Cc cor@… added

comment:32 ocean903 months ago

#14458 was marked as a duplicate.

Note: See TracTickets for help on using tickets.