Make WordPress Core

Opened 3 weeks ago

Closed 2 weeks ago

Last modified 12 days ago

#62760 closed defect (bug) (fixed)

Noindex URLs when they render unapproved comments

Reported by: joostdevalk's profile joostdevalk Owned by: peterwilsoncc's profile peterwilsoncc
Milestone: 6.8 Priority: normal
Severity: normal Version:
Component: Comments Keywords: has-patch 2nd-opinion
Focuses: Cc:

Description

When you leave a comment on a site and that comment goes into moderation, by default, WordPress redirects you to the post permalink, with an unapproved and a moderation-hash parameter. This means that a comment that is not approved can be visible on the frontend of a site, and still be indexed, because those URLs are not noindexed by WordPress.

More details in my blog post, especially the aside.

To reproduce

  1. Make sure all comments go to moderation.
  2. Leave a comment on a post.
  3. See that you're redirect to a post with the unapproved parameter.
  4. See that the URL is not noindex'ed.

Expected results
The URL should be noindex'ed.

Change History (13)

#2 @peterwilsoncc
3 weeks ago

  • Milestone changed from Awaiting Review to 6.8

Moving to the 6.8 milestone for investigation.

Quoting the relevant part of the blog post to allow for ease of contribution:

Aside: the SEO impact of the WordPress comment system

The fact that WordPress redirects to a URL with an unapproved parameter is incredibly tricky in my eyes; I’m never a fan of URL parameters being added anywhere (as it usually busts the cache, and bloats the URL space), but here it also leads to potential SEO issues if people link to these parameterized URLs. I was able to find quite a few of these with relatively simple Google search:

<snip sample urls>

Since WordPress does not noindex these URLs by default, this is a perfectly sensible way of injecting links into a page and getting (usually nofollow, but still) links from domains that have no idea they’re linking to you, thinking they’ve not approved your comment.

This ticket was mentioned in PR #8070 on WordPress/wordpress-develop by @peterwilsoncc.


3 weeks ago
#3

  • Keywords has-patch added

Adds a noindex directive to pages showing a preview of an unapproved comment prior to moderation.

Although the pages have a canonical URL referring to page without the comment preview has, the URL can be indexed by search engines none-the-less. I presume this is due to the content differing.

Trac ticket: Core-62760

@joostdevalk commented on PR #8070:


3 weeks ago
#4

LGTM, thanks @peterwilsoncc !

@peterwilsoncc commented on PR #8070:


3 weeks ago
#5

@felixarntz @jdevalk I realized overnight that the filter I used doesn't include the noarchive directive, should I use wp_robots_sensitive_page() instead?

@flixos90 commented on PR #8070:


3 weeks ago
#6

@felixarntz @jdevalk I realized overnight that the filter I used doesn't include the noarchive directive, should I use wp_robots_sensitive_page() instead?

I'm not sure. The change here is not really about a "sensitive page" I would argue, and when looking at other usages of wp_robots_no_robots() and wp_robots_sensitive_page(), the first seems to be more in line with what's needed here. But to be fair, I'm also unsure whether or not the noarchive needs to be set here.

@joostdevalk commented on PR #8070:


2 weeks ago
#7

I’m ok with either

@peterwilsoncc commented on PR #8070:


2 weeks ago
#8

Thanks both, I'll go with the existing code to avoid complexity.

#9 @peterwilsoncc
2 weeks ago

  • Owner set to peterwilsoncc
  • Resolution set to fixed
  • Status changed from new to closed

In 59576:

Comments: Noindex pages containing unapproved comments.

Adds a noindex directive to pages displaying a preview of an unapproved comment, ie pages with both an approved and moderation-hash parameter.

This is to prevent the pages from appearing in search engines which can be the case if they ignore the canonical URL directive.

Props peterwilsoncc, flixos90, joostdevalk.
Fixes #62760.

#11 @zodiac1978
13 days ago

  • Keywords 2nd-opinion added

This remark on the other ticket is not relevant anymore @jonoaldersonwp ?
https://core.trac.wordpress.org/ticket/49956#comment:27

Just asking, because I saw the comment and now we are doing exactly this ...

#12 follow-up: @joostdevalk
13 days ago

It's still true, ideally we'd remove the canonical in this case. But noindex does usually override canonical.

#13 in reply to: ↑ 12 @peterwilsoncc
12 days ago

Replying to joostdevalk:

It's still true, ideally we'd remove the canonical in this case. But noindex does usually override canonical.

It may be worth a follow up ticket to consider whether rel_canonical() should check the wp_robots filter to see if a url is noindexed.

But if you're not overly concerned, neither am I.

Note: See TracTickets for help on using tickets.