WordPress.org

Make WordPress Core

Opened 2 months ago

Last modified 8 weeks ago

#52536 new enhancement

Add "X-Robots-Tag: noindex" to feeds by default

Reported by: pikamander2 Owned by:
Milestone: Awaiting Review Priority: normal
Severity: major Version:
Component: Feeds Keywords:
Focuses: Cc:

Description

We’ve noticed that spammy websites are linking to RSS search results on our site that include their domain in the search terms.

For example, if our site is wordpress.org and their site is example.com, they might link to wordpress.org/search/+example.com+best+pharmacy+pills+online/feed/rss2/

For normal WordPress searches, this isn’t a problem because the search results are set to “noindex”. However, these RSS2 pages are outputted as XML and don’t include any kind of “noindex” tag, so Google recognizes them as being indexable pages.

Looking around Google, it seems like this type of blackhat SEO technique is fairly common and most likely done in bulk by bots.

SEO plugins like Yoast and AIOSEOP appear to add a "noindex" tag to the search result pages, but neither of them seems to add that response header to the feeds, which means that most WordPress sites are vulnerable to that tactic.

Since feeds and search pages are built into WordPress's core and most people wouldn't want their results to be indexed anyway, can we add noindex to those two types of pages by default?

Change History (4)

#1 @sabernhardt
2 months ago

  • Component changed from General to Feeds
  • Summary changed from Add "noindex" to search results by default and "X-Robots-Tag: noindex" to feeds by default to Add "X-Robots-Tag: noindex" to feeds by default

Ticket #52457 is already opened for the search pages, and this ticket could address the feeds.

#2 @jonoaldersonwp
2 months ago

  • We can't blanket noindex all feed views, because they need to be indexable/indexed for various external services (Google podcasts, some Facebook stuff, etc).
  • However, the recent ticket to noindex search results (#52457) could, indeed, be extended to noindex all formats of search results; including the RSS feeds thereof, via a HTTP header.
Last edited 2 months ago by jonoaldersonwp (previous) (diff)

#3 @peterwilsoncc
8 weeks ago

To account for sites using full page caching that doesn't include the HTTP Headers, could the robots API be used to add meta tags to RSS on the rss2_head?

<channel>
  <xhtml:meta xmlns:xhtml="http://www.w3.org/1999/xhtml" name="robots" content="noindex" />
  ...
</channel>

I'm unable to find the correct tag for atom feeds but hopefully such a thing exists.

#4 @jonoaldersonwp
8 weeks ago

Ooh, didn't know that existed. Yes.

Note: See TracTickets for help on using tickets.