#33156 closed enhancement (fixed)
Allow admin-ajax crawling
Reported by: | joostdevalk | Owned by: | SergeyBiryukov |
---|---|---|---|
Milestone: | 4.4 | Priority: | normal |
Severity: | normal | Version: | |
Component: | General | Keywords: | 2nd-opinion has-patch |
Focuses: | Cc: |
Description
As plugins are using admin-ajax.php on the frontend, we should add
Allow: /admin/admin-ajax.php
To the default robots.txt to prevent Google from sending out million of emails, see this article: https://www.seroundtable.com/google-warning-googlebot-css-js-20665.html
Attachments (2)
Change History (28)
This ticket was mentioned in Slack in #core by ocean90. View the logs.
9 years ago
#2
follow-up:
↓ 3
@
9 years ago
Joost, is there value in removing the Disallow of /wp-admin entirely? I know you've recommended that in the past - but do you think that would be preferable behavior for Core, or no? Just leave it like this with an exception in place for admin-ajax?
#3
in reply to:
↑ 2
;
follow-up:
↓ 4
@
9 years ago
For what it's worth, I'd rather allow everything in robots.txt and use noindex,nofollow meta tags for private sites & preventing indexing of wp-admin. Google recommends it as a more effective method for preventing indexing.
Replying to dmchale:
The WordPress coding standards are to use tabs not spaces for indention. Would you mind refreshing the patch?
#4
in reply to:
↑ 3
@
9 years ago
Replying to peterwilsoncc:
For what it's worth, I'd rather allow everything in robots.txt and use noindex,nofollow meta tags for private sites & preventing indexing of wp-admin. Google recommends it as a more effective method for preventing indexing.
That was the solution I was alluding to in my comment to Joost above. Back in February, he recommended getting rid of the /wp-admin block entirely. But I didn't want to create that patch without a conversation happening first, either, since that wasn't his suggestion as the OP on this ticket. Would be very easy though, we'd just have to remove everything in the "else" side of the $public check. A default file would still be returned, albeit nearly blank, and we still have the ability to write the Disallow /
if the site isn't in public mode.
Replying to peterwilsoncc:
The WordPress coding standards are to use tabs not spaces for indention. Would you mind refreshing the patch?
Thanks for the heads up. New install of PHPStorm on this pc, and I forgot to turn my whitespace highlighting on. Fixed now, shouldn't happen again. :) Since Mark already submitted one with tabs, I won't clutter things up with another copy.
#6
in reply to:
↑ 5
@
9 years ago
Replying to pavelevap:
admin-ajax.php
is a PHP file and Google notified about CSS and JS?
Google has a problem with it when theme or plugin authors do something like this... :)
<link rel='stylesheet' id='style-css' href='http://mydomain.com/wp-admin/admin-ajax.php?action=style' type='text/css' media='all' />
I'm sure there's other use cases where it's causing problems as well, but this one in particular has hit a number of my client sites who are using purchased themes.
#7
follow-up:
↓ 8
@
9 years ago
-1
I don't think this should be in core. Themes should not depend on, or access, /wp-admin
. If they do, they should fix the "crawlablity" of it through hooks. Core may offer an ajax endpoint outside /wp-admin
, if necessary.
One day, for some, it should be possible to delete /wp-admin
and install or use an alternative admin through WP REST API
. In the mean time, find another solution to this problem.
#8
in reply to:
↑ 7
@
9 years ago
Replying to knutsp:
-1
I don't think this should be in core. Themes should not depend on, or access,
/wp-admin
. If they do, they should fix the "crawlablity" of it through hooks. Core may offer an ajax endpoint outside/wp-admin
, if necessary.
One day, for some, it should be possible to delete
/wp-admin
and install or use an alternative admin throughWP REST API
. In the mean time, find another solution to this problem.
Right now Core only offers ajax functionality through /wp-admin. Your proposal to CHANGE that fact is a much different discussion, IMO. https://codex.wordpress.org/AJAX_in_Plugins "Note 2: Both front-end and back-end Ajax requests use admin-ajax.php [...]"
#9
follow-up:
↓ 10
@
9 years ago
- Keywords 2nd-opinion has-patch added; needs-patch removed
AJAX needs to go via wp-admin for authenticated requests. A front-end AJAX handler was attempted in #12400 but pulled out.
What might be the downside of allowing admin-ajax.php
to be crawled? Any chance of unwanted content appearing in SERPs?
#10
in reply to:
↑ 9
@
9 years ago
Replying to johnbillion:
Any chance of unwanted content appearing in SERPs?
admin-ajax has @header( 'X-Robots-Tag: noindex' );
already, so no content found there should appear in any SERPs.
This ticket was mentioned in Slack in #core by dmchale. View the logs.
9 years ago
This ticket was mentioned in Slack in #core by sergey. View the logs.
9 years ago
This ticket was mentioned in Slack in #core by sergey. View the logs.
9 years ago
#17
follow-up:
↓ 18
@
9 years ago
Didn't know if I should start a new ticket or not and couldn't find one covers it. The order that WP is outputting the allow/disallow rules
Disallow: /wp-admin/ Allow: /wp-admin/admin-ajax.php
may not compatible with all crawlers. To be comparable with all crawlers the order of these rules needs to be reversed.
See: https://en.wikipedia.org/wiki/Robots_exclusion_standard#Allow_directive.
I can't find any information that contradicts what is presented in the wiki article.
#18
in reply to:
↑ 17
@
9 years ago
Opened a new ticket about order.
Replying to Hube2:
Didn't know if I should start a new ticket or not and couldn't find one covers it. The order that WP is outputting the allow/disallow rules
Disallow: /wp-admin/ Allow: /wp-admin/admin-ajax.phpmay not compatible with all crawlers. To be comparable with all crawlers the order of these rules needs to be reversed.
See: https://en.wikipedia.org/wiki/Robots_exclusion_standard#Allow_directive.
I can't find any information that contradicts what is presented in the wiki article.
#19
@
7 years ago
Google is getting a 400 error when it is crawling wp-admin/admin-ajax.php as the link is defined in javascript in the Avada theme:
jQuery( document ).ready( function() { var ajaxurl = 'https://example.com/wp-admin/admin-ajax.php'; if ( 0 < jQuery( '.fusion-login-nonce' ).length ) { jQuery.get( ajaxurl, { 'action': 'fusion_login_nonce' }, function( response ) { jQuery( '.fusion-login-nonce' ).html( response ); }); } });
So, maybe this was not a good idea?
#20
@
6 years ago
I just received a 400-error in my google search console on the /wp-admin/admin-ajax.php URL.
Have things changed since the start of this thread?
This ticket was mentioned in Slack in #core by kevin940726. View the logs.
4 years ago
#22
follow-ups:
↓ 23
↓ 24
↓ 25
@
3 years ago
It seems like all the comments of concern were ignored.
WordPress does not need permission from robots.txt to access itself. Take a step back and ask yourself why robots are "allowed" access to wp-admin/admin-ajax.php. It makes no sense, as if people forgot the role of robots.txt
Robots.txt does not exist to manage all conceivable non-human interactions. It was originally created to save bandwidth, because one gig of transfer could cost over $10. If you simply crawled somebody's website back in 1999, they might threaten to sue you. Ask me how I know. Frankly, bandwidth and CPU are cheap enough now that robots.txt should be obsolete.
After some decades and bazillions of pageviews, I have never used a robots.txt, because I don't want to discourage robots from enjoying my content. I don't know the exact version of WordPress this changed, but it seems WordPress decided to make robots.txt mandatory. In my expert opinion, that was a foolish decision.
Respectful bots will avoid /wp-admin/ without being told. Disrespectful bots will do whatever they want. This auto-generated robots.txt is unnecessary, it just creates confusion and solves nothing.
I'm told the WordPress philosophy is to make decisions instead of offering options. You don't make certain decisions without asking my permission. I'm drawing the line at robots.txt, I see this as a violation where WordPress is claiming ownership of something that does not belong to WordPress. (My server, my choice.) If I "allow" a robot onto my server, that is my decision to make as a system admin. Just because the average WordPress user is not very sophisticated with technology, that doesn't mean you can just take control of whatever you want, just because I gave you permission to auto-install upgrades.
What's next? Are you going to try creeping into php.ini? Seriously, this should be a concern as larger companies are allowed to submit code to WordPress. If you're going to draw a line somewhere, might as well draw the line with robots.txt
As for admin-ajax.php, whoever added that line should at least include a robots.readme to explain why robots.txt is mandatory, why it makes no sense, the relationship to wp-sitemap.xml, and include a link back to this URL, because it took me two days to follow the breadcrumbs back to this ticket. To say the least, this is not how I wanted to spend my week. And after digging into pages and pages of explanations, I'm still wondering why anyone would add admin-ajax.php to robots.txt
"Since it's often used on front-end."
Like I said, the front-end has nothing to do with robots.txt
"What might be the downside of allowing admin-ajax.php to be crawled? Any chance of unwanted content appearing in SERPs?"
BINGO! I'm having a problem with DuckDuckGo right now, it's listing /wp-admin/ as the #1 search result for my domain.
#23
in reply to:
↑ 22
@
3 years ago
Replying to KnowingArt_com:
It seems like all the comments of concern were ignored.
I would tend to agree considering that AJAX itself is a problem
Known SQL injection exploit in AJAX see https://www.exploit-db.com/exploits/48475
# Exploit Title: WordPress Plugin Ajax Load More 5.3.1 - '#1' Authenticated SQL Injection
# Exploit Author: SunCSR (Sun* Cyber Security Research) - Nguyen Khang
# Google Dork: N/A
# Date: 2020-05-18
# Vendor Homepage: https://connekthq.com/plugins/ajax-load-more/
# Software Link: https://vi.wordpress.org/plugins/ajax-load-more/
# Version: <= 5.3.1
# Tested on: Ubuntu 18.04
Description:
A blind SQL injection vulnerability is present in Ajax load more.
$wpdb->get_var("SELECT repeaterDefault FROM " . $table_name . " WHERE name
'$n'");
#24
in reply to:
↑ 22
;
follow-up:
↓ 26
@
3 years ago
Replying to KnowingArt_com:
I'm told the WordPress philosophy is to make decisions instead of offering options. You don't make certain decisions without asking my permission.
In fairness, that's exactly what it means to make a decision and not offer an option.
That said, this ticket was discussing the default behavior of the robots.txt file. You can short-circuit what WordPress does out of the box in one of two ways.
- Create a physical robots.txt file on your server. If WordPress detects a physical file at the web root, it will not add to / remove from / modify that file in any way (this includes any plugins that dynamically modify the robots.txt file as well)
- Use the
robots_txt
filter to modify the contents of the WordPress defaults https://developer.wordpress.org/reference/hooks/robots_txt/
#25
in reply to:
↑ 22
@
3 years ago
Replying to KnowingArt_com:
It seems like all the comments of concern were ignored.
I'm starting to get a better understanding of the issue(s) that triggered this desire to sortof whitelist this ajax script for Googlebot. However, we should not pollute robots.txt to fix poorly-conceived AJAX themes that lazyload content without non-AJAX placeholder content, do not fail gracefully, or whatever happened to these themes.
Likewise, robots.txt is not the best place for Googlebot-specific problems either. I think WordPress is big enough to work that out with Google directly. And for really specific problems, just use your web server config.
I'm also thinking, if WordPress decides to back out of robots.txt, will that break stuff? I doubt it, but I don't know for sure. I think the alternative is worse. Because now you have wp-sitemap.xml in there, how deep will this rabbit hole go before all the custom robots.txt files out there the need to be rewritten to "catch up" with the auto-generated WordPress robots.txt?
Also, I created this last night...
#26
in reply to:
↑ 24
@
3 years ago
Replying to dmchale:
- Create a physical robots.txt file on your server. If WordPress detects a physical file at the web root, it will not add to / remove from / modify that file in any way (this includes any plugins that dynamically modify the robots.txt file as well)
- Use the
robots_txt
filter to modify the contents of the WordPress defaults https://developer.wordpress.org/reference/hooks/robots_txt/
- Nice, but without comments in robots.txt, how will the casual user know? My first impression was to 'touch robots.txt' but...
1a. How will anyone know this won't break wp-sitemap.xml? That's how I ended up here. Upon further investigation, it seems *Google* contributed the code that adds wp-sitemap.xml to robots.txt If a Google employee adds some code to the WordPress robots.txt, that tells me Google wants that code to be there, and I am going to think twice about removing it.
1b. Also casual user: If I remove this weird ajax thing, am I going to break something? Maybe I should investigate further. And down the rabbit hole we go into ajax themes with broken Googlebot renderings :-(
- That was my first attempt, but a) I crashed my site by trying some random Stack Exchange solution, b) I have many blogs to manage on several servers, and what if the blog changes themes? Is there a "functions.php" that affects all themes?, c) I don't really want a filter, I want to completely disable the creation of robots.txt, which is probably harder than it sounds.
Add "Allow" for /wp-admin/admin-ajax.php to the end of the default generated robots.txt file