Opened 3 years ago
Last modified 2 years ago
#52536 new enhancement
Add "X-Robots-Tag: noindex" to feeds by default
Reported by: |
|
Owned by: | |
---|---|---|---|
Milestone: | Awaiting Review | Priority: | normal |
Severity: | major | Version: | |
Component: | Feeds | Keywords: | |
Focuses: | Cc: |
Description
We’ve noticed that spammy websites are linking to RSS search results on our site that include their domain in the search terms.
For example, if our site is wordpress.org and their site is example.com, they might link to wordpress.org/search/+example.com+best+pharmacy+pills+online/feed/rss2/
For normal WordPress searches, this isn’t a problem because the search results are set to “noindex”. However, these RSS2 pages are outputted as XML and don’t include any kind of “noindex” tag, so Google recognizes them as being indexable pages.
Looking around Google, it seems like this type of blackhat SEO technique is fairly common and most likely done in bulk by bots.
SEO plugins like Yoast and AIOSEOP appear to add a "noindex" tag to the search result pages, but neither of them seems to add that response header to the feeds, which means that most WordPress sites are vulnerable to that tactic.
Since feeds and search pages are built into WordPress's core and most people wouldn't want their results to be indexed anyway, can we add noindex to those two types of pages by default?
Change History (5)
#1
@
3 years ago
- Component changed from General to Feeds
- Summary changed from Add "noindex" to search results by default and "X-Robots-Tag: noindex" to feeds by default to Add "X-Robots-Tag: noindex" to feeds by default
#2
@
3 years ago
- We can't blanket noindex all feed views, because they need to be indexable/indexed for various external services (Google podcasts, some Facebook stuff, etc).
- However, the recent ticket to noindex search results (#52457) could, indeed, be extended to noindex all formats of search results; including the RSS feeds thereof, via a HTTP header.
#3
@
3 years ago
To account for sites using full page caching that doesn't include the HTTP Headers, could the robots API be used to add meta tags to RSS on the rss2_head
?
<channel> <xhtml:meta xmlns:xhtml="http://www.w3.org/1999/xhtml" name="robots" content="noindex" /> ... </channel>
I'm unable to find the correct tag for atom feeds but hopefully such a thing exists.
#5
@
2 years ago
Just wanted to add that we're seeing Google indexing some pages like this, and we're getting quite a bit of traffic too.
For others that stumble on this before the fix is ready a workaround is to add a redirect rule to your site (I used the WPEngine dashboard) to redirect from these search feed pages to your home page. The regex below might be useful.
^/search/[^/]*/feed/[^/]*/?$
Ticket #52457 is already opened for the search pages, and this ticket could address the feeds.