Opened 13 years ago
Last modified 5 years ago
#18660 assigned enhancement
Enhance rel_canonical function, add filter
Reported by: | nathanrice | Owned by: | joostdevalk |
---|---|---|---|
Milestone: | Future Release | Priority: | normal |
Severity: | normal | Version: | 3.3 |
Component: | Canonical | Keywords: | has-patch needs-testing SEO |
Focuses: | Cc: |
Description
I think it's a bit shortsighted to think that only singular pages need the canonical tag output in the <head>
. Considering the fact that just about any page on your site can be accessed with a malformed URL, I think it's time to enhance this function.
The attached patch is just a first pass. But I think it gets us started in the right direction. There's also a filter before output, so themes and plugins can further enhance the output of this plugin (related #14458).
Patch is against [438126].
Attachments (11)
Change History (48)
#1
follow-up:
↓ 2
@
13 years ago
Note that 438126 would be Akismet external's SVN revision number. Current trunk is [18667].
Also note something about canonical in http://yoast.com/wp-theme/genesis/, published earlier this week.
#2
in reply to:
↑ 1
@
13 years ago
Replying to nacin:
Note that 438126 would be Akismet external's SVN revision number. Current trunk is [18667].
Yes, of course. My mistake.
Replying to nacin:
Also note something about canonical in http://yoast.com/wp-theme/genesis/, published earlier this week.
For the official rel_canonical() function, we definitely shouldn't canonicalize back to page 1 of an archive. That suggestion was to benefit our users at the recommendation of a very well respected SEO expert, Greg Boser. Yoast said he and Greg will have to fight about that the next time they see each other :-)
So yes, my patch as is won't work. But it's a start. When I get some time, I'll updated the canonical generation to account for paging.
#4
in reply to:
↑ 3
@
13 years ago
Replying to scribu:
Why is this needed, when we also have canonical redirects?
Because canonical redirects don't correct all malformed URLs. Consider something like this:
http://example.com/?referrer=wordpress.org
No redirect. The canonical tag helps search engines know that when people link to the homepage with a malformed URL, you want pagerank to pass to http://example.com/
and not the other URL (seen by Google as two different pages).
#5
@
13 years ago
Sorry about link-template.php.diff and post-template.php.diff. Different patches for different tickets. My bad.
#6
@
13 years ago
- Owner set to joostdevalk
- Status changed from new to assigned
+1 on fixing this.
I've attached a better patch, based on the code in my WordPress SEO plugin. The pagination canonical Genesis uses is at best agressive optimization, and not based on the canonical standards, so we should indeed not include that.
This patch makes sure WordPress outputs a canonical on:
- singular posts, pages, etc.
- the homepage & frontpage
- taxonomy, category & tag archives
- post type archives
- dated archives
It also deals with pagination, both for paged posts and pages and all paginated archives.
#8
follow-up:
↓ 10
@
13 years ago
@joostdevalk
Looks excellent to me.
Maybe this is just paranoia, but since $link
can be filtered, I'd like to see it escaped before output.
Also, use home_url()
in lieu of get_bloginfo('url')
.
The only thing I don't get is the distinction between front page and home (if show_on_front
is page). Seems to me, both of those would be home_url()
.
Otherwise, patch looks good to me.
#9
@
13 years ago
Nevermind my home/front confusion. I think you're trying to target the blog posts page, right?
#10
in reply to:
↑ 8
;
follow-up:
↓ 11
@
13 years ago
Replying to nathanrice:
Maybe this is just paranoia, but since
$link
can be filtered, I'd like to see it escaped before output.
Agreed, fixed.
Also, use
home_url()
in lieu ofget_bloginfo('url')
.
In fact, home_url( '/' );
to make it even shorter :)
The only thing I don't get is the distinction between front page and home (if
show_on_front
is page). Seems to me, both of those would behome_url()
.
Your second comment was correct, it's the posts page. And the logic for that is annoying indeed, we should probably have a convenience function for that too.
#11
in reply to:
↑ 10
@
13 years ago
Replying to joostdevalk:
Your second comment was correct, it's the posts page. And the logic for that is annoying indeed, we should probably have a convenience function for that too.
Indeed. is_home()
has become quite useless lately.
#13
in reply to:
↑ 12
@
13 years ago
Replying to nathanrice:
Should
esc_url()
be used instead ofesc_attr()
?
Yes indeed, changed.
Also fixed check for paginated posts and pages, that threw a notice in v2.
#14
@
13 years ago
Thanks to feedback from johnbillionh on IRC:
- added canonical for author archives
- fixed canonical for paged archives when pretty permalinks are NOT enabled
#16
@
13 years ago
Pass in the context to user_trailingslashit()
, use esc_url()
in the output, clean up some whitespace issues.
#18
@
13 years ago
- Cc sirzooro added
joostdevalk,
Please take a look on my plugin, Meta SEO Pack - in particular MetaSeoPack:::get_canonical()
method. It generates canonical URL for all WP pages, including multi-paged content (this is a bit tricky area). I haven't updated if for a while so some updates will be needed there (in particular use get_search_link()
for search results page and add support for custom post types if needed). It should give you few hints how to improve your patch.
#19
@
13 years ago
This new patch not only fixes canonical, but also implements rel=prev and rel=next, ticket #18672.
#20
@
13 years ago
Another pass, fixes:
- context passed to
user_trailingslashit
everywhere - removed use of
setup_postdata
for a simple count - fixed links for paginated pages on the frontpage (they need to include /page/)
- passed author_name to
get_author_posts_url
to simplify the process of getting that URL
Naming of functions probably needs to be discussed....
#21
follow-up:
↓ 22
@
13 years ago
You can use global variable $numpages
instead calculating page count again.
#22
in reply to:
↑ 21
;
follow-up:
↓ 23
@
13 years ago
Replying to sirzooro:
You can use global variable
$numpages
instead calculating page count again.
Actually can't because that variable hasn't been set at the point yet.
#23
in reply to:
↑ 22
@
13 years ago
Replying to joostdevalk:
Replying to sirzooro:
You can use global variable
$numpages
instead calculating page count again.
Actually can't because that variable hasn't been set at the point yet.
OK, I see. So your code may need extra change when #9911 will be commited.
#24
@
13 years ago
What's the status on this ticket? It was submitted and patched prior to the feature freeze for 3.3, but the milestone hasn't been changed. Do we need a certain number of definitive "patch works for me" messages before it can be committed?
#25
@
13 years ago
- Cc joost@… added
So, let's make sure this makes it into 3.4, what needs to be done to get that going?
#27
@
12 years ago
I've applied this patch, merged it with HEAD, and gone over each line of code. And I've done this no fewer than four times in the last six weeks. I can't bring myself to commit the whole thing at this point, and for two main reasons.
- I think adding rel=canonical to non-singular pages need some very careful consideration. get_current_archive_url() sounds like a fantastic idea in theory, but it could be damaging to a complicated site pretty easily.
I am worried about how a complicated drill-down URL (say, with multiple taxonomies, or a query string, or something else) would end up with an improper canonical link. This is perhaps the greatest issue when it comes to archive pages, and that's likely the only reason why we stuck to singular the first time around.
- I think the pagination aspects (both rel=canonical respecting pagination, and rel=next/prev handling pagination) is much more solid. I think it has the potential to cause problems for non-singular pages, but probably not as bad as a faulty rel=canonical could.
However, I am irked that the patch avoids using any API functions for creating pagination links, and instead calculates them on their own. I understand this is of no fault of the patch — rather, our APIs around pagination links (both singular "page" links and archive "paged" links) are all over the place.
There are also a number of filters and hooks scattered about these functions, which make me question things like pagination bases and other points of customization that could break. I would like to go through these functions, potentially in 3.5, and clean them up, and make them obvious about what is going on.
The one thing that seems most safe is a single, targeted patch that adds support for singularly paginated items to rel_canonical(). And it is getting too late in the cycle to make such changes, so I would rather try to do all of this in one effort for 3.5. Sorry, yoast.
#28
@
11 years ago
See also #11694
Default core behavior in 3.5.1 causes multipage posts using <!-- nextpage --> to have subsequent post pages ignored / deindexed by google due to link rel=canonical pointing to first page of multipage post. Compounded by rel=next/prev pointing to next/prev chronological post not next/prev page of multipage post.
Google on Pagination
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=1663744
Google on Canonical
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=139394
Stackexchange discussion :
http://wordpress.stackexchange.com/questions/84647/multi-page-posts-do-not-get-indexed-by-google-due-to-canonical-urls
works-for-me 3.5.1 :
Add post page numbers greater than 1 to rel=canonical on multipage posts
wp-includes/link-template.php:
function rel_canonical() { ... if ( $page = get_query_var('page') ) $link = $link . "$page/";
Remove next/prev links from all pages in header (not ideal)
functions.php:
// remove rel=next/prev links in header remove_action('wp_head', 'start_post_rel_link'); remove_action('wp_head', 'adjacent_posts_rel_link');
Agree all related canonical issues should be handled elegantly as proposed , including filter.
#29
@
11 years ago
Happy to work on this patch more if someone (@nacin?) is willing to bless the task and provide me with support on how to fix this. I'd love for us to finally dig it all out and fix all the related nonsense old code in core...
#33
@
10 years ago
Wondering what the status is of this ticket.
At the very least both rel_canonical()
as well as get_adjacent_post_rel_link()
need an esc_url()
around the link as it could contain &
's.
If it would help, I'll gladly send in a separate patch to at least fix that.
#34
@
7 years ago
- Keywords SEO added
Having a new go at this patch at the moment, extracting all the canonical code from Yoast SEO.
#35
@
7 years ago
- Keywords 2nd-opinion removed
- Milestone changed from Awaiting Review to Future Release
I have a concern about get_current_archive_link()
which appears in canonical.6.patch, namely that it doesn't account for custom rewrite rules added by plugins. This is also a problem for wp_get_canonical_url()
actually in regards to custom rewrite endpoints (anything other then page
is stripped).
What about an algorithm like the following to determine the default canonical URL for a given request:
<?php $added_query_vars = $wp->query_vars; if ( ! $wp_rewrite->permalink_structure || empty( $wp->request ) ) { $url = home_url( '/' ); } else { $url = home_url( user_trailingslashit( $wp->request ) ); parse_str( $wp->matched_query, $matched_query_vars ); foreach ( $wp->query_vars as $key => $value ) { // Remove query vars that were matched in the rewrite rules for the request. if ( isset( $matched_query_vars[ $key ] ) ) { unset( $added_query_vars[ $key ] ); } } }
This ensures that custom rewrite rules and endpoints are honored, as well as all public query vars.
Note: I'm looking into this for the sake of adding canonical support to the AMP plugin, as the AMP spec requires a rel=canonical
link on every AMP response, even if it points to itself.
Enhance the rel_canonical() function