#52457 closed task (blessed) (fixed)
WordPress vulnerable to search-reflected webspam
Reported by: | abagtcs | Owned by: | peterwilsoncc |
---|---|---|---|
Milestone: | 5.7 | Priority: | normal |
Severity: | normal | Version: | |
Component: | General | Keywords: | has-patch has-unit-tests |
Focuses: | template | Cc: |
Description
WordPress echoes back searched-for terms on its search results page. For example, a search on an installation on www.foo.edu for "scholarship programs" would have the URL of:
The resulting page would include the text:
Search results for: scholarship programs
Web spammers have started to abuse search features of those sites by passing in spam terms and hostnames in hopes of boosting the search rankings of the spammers’ sites. For example, www.foo.edu might be being abused by URLs that look like:
https://www.foo.edu/?s=Buy%20cheap%20viagra%20without%20prescrition%20-%20www.getcheapdrugs.com
and that produce a page that includes:
Search Results for: Buy cheap viagra without prescrition - www.getcheapdrugs.com
The spammers place these links in open wikis, blog comments, forums, and other link farms, relying upon search engines crawling their links, and then visiting and indexing the resulting search results pages and included spammy content.
This attack is surprisingly quite widespread, affecting many websites around the world. Though some CMS’s and sites powered by custom-written code may be vulnerable to this technique, (based on preliminary investigation) it appears that --at least in the .edu space-- the most targeted web platform by far is WordPress. For example, to see many examples of U.S. educational websites targeted by the attack, you can do a Google search for:
site:edu inurl:s “buy”
There are several possible ways to prevent a website from being abused by this method, but adding the appropriate header or meta tag (see https://developers.google.com/search/docs/advanced/crawling/block-indexing ) appears to be the most appropriate and effective, especially with respect to getting the spam URLs removed from search engine indexes.
I have submitted this problem as a security concern for WordPress on HackerOne, but it was closed as not a vulnerability. So I'm suggesting, then, that core be modified to either always, or by default (with the ability to disable), add the appropriate meta tag into search result page headers:
<meta name=’robots’ content=’noindex,follow’ />
This will indicate to crawlers that the content is not to be indexed and prevent the site from being abused by web spammers
Attachments (2)
Change History (26)
#2
@
4 years ago
This is a problem indeed. According to an older Google q/a stream, if I recall correctly, it was mentioned that crawling search results add little to no value in regards to ranking and also obviously slow down crawling itself.
So there's nothing wrong at calling a noindex, nofollow in ?s=
, it's basically better doing it even to avoid the spam in parallel. Feel free to correct me if I'm wrong though as I'm not an expert on the matter :).
That being said, I agree that maybe we should "protect" the search results by default with an extra option along the Search engine visibility
setting.
Note that currently this is something that various SEO plugins do as well :).
#3
@
4 years ago
Correct; this is a huge problem when it comes to spam, security, crawl budget, and more.
All internal search results should output a meta robots tag with a noindex, follow
value.
#4
@
4 years ago
I've seen this a lot recently too and had a few customer complaints who thought they had been hacked. I'd be all for core behavior to not index ?s= search result pages. Would the additional option under Settings > Reading be necessary? Maybe if you really wanted search pages to be indexed, that could be left up to plugins?
#5
@
4 years ago
Now that we have a rudimentary robots tag API in core, this should definitely be considered.
I agree on not needing a UI control, and, having this noindex by default.
There are very few valid use-cases for enabling indexing of search results, outside of having a very non-standard theme. Even Yoast SEO doesn't allow users to disable our nonindex'ing of search results via the UI (though we do have filters).
#6
@
4 years ago
- Focuses template added
- Keywords has-patch needs-testing needs-unit-tests added
It's a pretty clever way to spam. The esc_html
of course helps with XSS, but the text itself is part of the page.
Suggesting a patch with add_filter( 'wp_robots', 'wp_robots_no_robots' );
when the search query is fetched. I think it wouldn't be possible to track all use cases unless we grep-and-fix $_REQUEST['s']
for user-search related pages, but I hope the patch is a start.
#7
@
4 years ago
- Keywords needs-refresh added
Thanks for the patch @ayeshrajans
While the add_action()
such as you've got is a good approach, adding it inside get_search_query()
will cause problems for themes or plugins wishing to handle robots tags in there own way.
Each time get_search_query()
is called, they'd need to reinitialize their own handling.
There are two possible approaches I can think of (others may have another suggestion or two):
- in the
noindex
function, callwp_no_robots()
if the blog is not private, or ifis_search()
is true - As the templates are loaded, add the code from your original patch once the search template is chosen.
This ticket was mentioned in PR #996 on WordPress/wordpress-develop by maniu.
4 years ago
#8
- Keywords needs-refresh removed
Trac ticket: https://core.trac.wordpress.org/ticket/52457
#9
@
4 years ago
I tested this by applying PR 996, performing a search, and viewing the source of the search result page.
<meta name='robots' content='noindex, follow, max-image-preview:large' />
is output in the head.
I then followed the testing instructions from 51511#comment:20. Removing the filter(s) works correctly. I did not test on multisite.
#10
@
4 years ago
- Keywords has-unit-tests added; needs-testing needs-unit-tests removed
In 52457.diff:
- no change from pull request
- unit test to ensure noindex displays on search
- unit test to ensure noindex does not display on other pages
Revised approach looks good to me.
@jonoaldersonwp I noticed sensitive pages include a noindex, noarchive
directive. Is the latter required for search too?
If not, I've also tested this and think it's good for commit if noarchive
isn't needed.
This ticket was mentioned in Slack in #core by hellofromtonya. View the logs.
4 years ago
#15
@
4 years ago
- Owner set to peterwilsoncc
- Resolution set to fixed
- Status changed from new to closed
In 50370:
This ticket was mentioned in Slack in #core by sergey. View the logs.
4 years ago
hellofromtonya commented on PR #996:
4 years ago
#18
Closed as part of changeset https://core.trac.wordpress.org/changeset/50370
This ticket was mentioned in Slack in #forums by vladytimy. View the logs.
4 years ago
This ticket was mentioned in Slack in #core by poena. View the logs.
4 years ago
This ticket was mentioned in Slack in #core by sergey. View the logs.
4 years ago
#22
follow-up:
↓ 23
@
4 years ago
Should we noindex on all search results, or just ones that return zero posts?
#23
in reply to:
↑ 22
@
4 years ago
Replying to matt:
Should we noindex on all search results, or just ones that return zero posts?
All. Search results are almost always low value/utility from an SEO (and user) perspective, and often heavily duplicate existing taxonomy pages.
Hi and welcome to WordPress Trac @abagtcs,
Milestoning this issue to WordPress 5.7.