Make WordPress Core

Opened 5 years ago

Last modified 2 years ago

#49220 new defect (bug)

rel_canonical generates the wrong canonical structure for paginated pages.

Reported by: bradleyt's profile bradleyt Owned by:
Milestone: Awaiting Review Priority: normal
Severity: normal Version: 4.6
Component: Canonical Keywords:
Focuses: Cc:

Description

rel_canonical is responsible for generating canonical permalink structures for pages. For example, when visiting the WordPress.org homepage, the canonical permalink is set to https://wordpress.org/.

rel_canonical has some built in handling of paginated page structures, but this does not seem to be doing the correct thing.

A paginated page has the url structure of https://example.com/page/2. However, on this page the canonical tag would be output as:

<link rel="canonical" href="https://example.com/2/" />

As well as simply being wrong, this can conflict with WordPress's wp_guess_url functionality. For example, https://central.wordcamp.org/page/5/ canonicalises to https://central.wordcamp.org/5/, which when visited redirects to the "5-reasons-you-should-attend-wordcamp-nyc" post (due to the presence of "5" in the slug). I would imagine this could seriously confuse some search engines.

This situation is made worse by the fact that only a small proportion of pages created through WordPress have legitimate pagination. In these cases the page should canonicalise to the first page, but I believe handling of non-paginated pages would be better considered in a separate ticket.

This has come to light as a result of the following meta tickets which are contributing to the current SEO issues on wordpress.org:
https://meta.trac.wordpress.org/ticket/4198
https://meta.trac.wordpress.org/ticket/4564

This bug is only affecting sites with pretty permalinks enabled.

Change History (7)

#1 @SergeyBiryukov
5 years ago

  • Keywords close added; needs-patch removed

Hi there, welcome back to WordPress Trac! Thanks for the report.

This appears to be an issue with WordCamp.org's implementation of the rel="canonical" tag, not WordPress core.

By default, rel_canonical() only outputs the tag for singular queries, not for archives or paginated views.

Note that there are two formats of paginated views in core:

It looks like WordCamp.org's implementation of the rel="canonical" tag confuses between the two, but that's something to address on that specific site.

Note that @iandunn's suggestion for a core ticket in #meta4564 referred to performing a redirect for paginated archive views when the homepage is set up to display a page, not to the rel_canonical() behavior.

#2 follow-up: @bradleyt
5 years ago

@SergeyBiryukov, thanks for getting back to me.

First thanks for clarifying that posts using <!--nextpage--> don't have the /page/ url segment. I didn't know this and so that explains some of the reasoning behind the rel_canonical behaviour.

However, I do still belive that there is an issue here unrelated to the WordCamp site. Any page in a WordPress installation can be accessed at {page slug}/page/{number}, despite that pagination structure being intended for archives. In these cases the canonical url is wrong.

For example, look at the canonical meta tag on these non-wordcamp pages:
https://wordpress.org/page/2/
https://wordpress.org/showcase/page/2/

Its worth noting that whilst these pages generally show the same content as the true canonical version, I frequently see themes & plugins building custom pagination for pages using this url structure. Therefore, to maintain backwards compatibility we must not remove these url structures (either by 404'ing or redirecting).

It seems like rel_canonical needs some extra checks to differentiate between the two pagination types - when not in a <!--nextpage--> context the canonical should be equal to page 1.

#3 in reply to: ↑ 2 @SergeyBiryukov
5 years ago

Replying to bradleyt:

Any page in a WordPress installation can be accessed at {page slug}/page/{number}, despite that pagination structure being intended for archives.

Yes, that issue is being tracked in #28081 / #45337.

In these cases the canonical url is wrong. For example, look at the canonical meta tag on these non-wordcamp pages:
https://wordpress.org/page/2/
https://wordpress.org/showcase/page/2/

That's still an issue with WordPress.org's implementation of the rel="canonical" tag.

By default, rel_canonical() does not output the canonical tag on these views, only on single posts or pages.

Last edited 5 years ago by SergeyBiryukov (previous) (diff)

#4 @bradleyt
5 years ago

Hi @SergeyBiryukov,

Sorry for the delaying in following up on your reply - I have been trying to check that my thinking is correct in regards to this issue. Firstly, let me say that this bug can be reproduced on a fresh install of WordPress. That said, there are a few conditions for this bug to be reproducible, which I didn't initially realise:

  • The page_on_front setting needs to be set to a static page. I.e. this issue does not seem to affect sites where the homepage is an archive.
  • Pretty permalinks must be enabled
  • The bug is only reproducible on the site homepage.

To give some examples of what that means in practice:

  • wordpress.org/page/2/ has the wrong canonical because it is the homepage for wordpress.org
  • wordpress.org/showcase/page/2/ has the wrong canonical because the showcase pages are actually setup as a standalone WordPress site (as part of a multisite installation)
  • wordpress.org/showcase/submit-a-wordpress-site/page/2/ is not affected, because it is not the homepage.
  • wordpress.org/news/ is not affected because although it is the homepage of a news site (like the showcase setup) the homepage is not set to show a static page.

*Note that I am using the term homepage to refer to the page served at a site's root URL, regardless of the setup of that site.

wp_get_canonical_url calls get_query_var( 'page', 0 ). Normally when visiting a URL of the style wordpress.local/another-page/page/2/ this returns 0. However, on an incorrectly paginated homepage such as wordpress.local/page/2/ this returns the page number - note that calling get_query_var( 'paged', 0 ) would also (correctly) return the same page number.

It seems that the incorrect page query variable is originating from WP_Query->parse_query specifically from an area commented as "Correct is_* for page_on_front and page_for_posts".

I have not got as far as being able to suggest a solution, but for now I hope that we can move away from the idea that this is related to the wordpress.org setup and focus on finding a solution. I personally believe that there are significant backwards-compatibility blockers to #28081 being resolved any time soon, and so I am in favour of this wp_get_canonical_url bug being treated as a standalone issue.

#5 @bradleyt
5 years ago

  • Keywords close removed

#6 @chesio
4 years ago

I guess I already reported the same problem here: #42835 But this ticket has more information, so maybe the other one should be closed.

#7 @JeffPaul
2 years ago

#42835 was marked as a duplicate.

Note: See TracTickets for help on using tickets.