Make WordPress Core

Opened 4 years ago

Closed 4 years ago

Last modified 4 years ago

#51511 closed feature request (fixed)

Introduce Robots API and Media Search Engine Visibility setting

Reported by: flixos90's profile flixos90 Owned by: flixos90's profile flixos90
Milestone: 5.7 Priority: normal
Severity: normal Version:
Component: General Keywords: has-patch has-unit-tests commit has-dev-note
Focuses: Cc:

Description

As proposed in the "Enhancing image preview: core proposal" announcement post, this ticket aims at introducing the following:

  • A simple filter-based Robots API to centrally manage content of the robots meta tag injected into the page.
  • A setting to toggle whether search engines are allowed to display large media from the site.
  • A max-image-preview:large robots directive which will be injected into the robots meta tag based on the new setting.

There are a couple of extra requirements:

  • The new Robots API should by default not include any directives (i.e. no robots meta tag would be printed). All WP core directives should be injected via their own filter callback functions.
  • The default behavior of which directives core injects should mirror core's behavior of today (with the only exception being the new conditional max-image-preview:large directive). More technically speaking, today's wp_head action callbacks to render robots meta tags should become wp_robots filter callbacks instead.
  • The setting that toggles the new directive should be exposed as a checkbox in Settings > Reading, together with the existing checkbox to control search engine visibility.
  • The setting should be enabled by default. However, in addition to relying on the setting, the max-image-preview:large directive should only be injected if the site is also allowing search engine indexing. More technically speaking, blog_public takes precedence over the new setting.
  • An admin pointer should inform users about the new setting, its default and what this means for WordPress behavior.

Side note: The filter-based Robots API this ticket aims to introduce should furthermore address #20037, which also requests robots customization, just a bit less comprehensively.

Attachments (2)

screenshot-setting-checkbox.png (100.4 KB) - added by flixos90 4 years ago.
The new checkbox in Settings > Reading
screenshot-admin-pointer.png (224.1 KB) - added by flixos90 4 years ago.
Admin pointer for the new feature

Download all attachments as: .zip

Change History (25)

This ticket was mentioned in PR #595 on WordPress/wordpress-develop by felixarntz.


4 years ago
#1

  • Keywords has-patch has-unit-tests added; needs-patch needs-unit-tests removed
  • Introduces basic Robots API controlled via wp_robots filter.
  • Adds wp_robots filter callback functions mirroring existing core behavior.
  • Replaces wp_head action hook callbacks related to "robots" meta tag with wp_robots filter hook callbacks.
  • Deprecates wp_head action hook callback functions related to "robots" meta tag, referencing the respective filter hook callback functions as replacement.
  • Introduces an additional filter function to unconditionally inject max-image-preview:large robots directive, and another filter function to inject it based on whether both blog_public and media_search_engine_visibility settings are truthy.
  • Adds infrastructure (defaults, upgrade, sanitization) for new media_search_engine_visibility setting, with the setting having a default value of 1 (enabled).
  • Displays a checkbox to control the new setting under _Settings > Reading_. Similarly to the related blog_public checkbox, it displays the inverse value, with the checkbox label being phrased as "Discourage search engines ...".
  • Displays an admin pointer to the admin dashboard which informs users about the new setting and that it can be tweaked under _Settings > Reading_.

Trac ticket: https://core.trac.wordpress.org/ticket/51511

@flixos90
4 years ago

The new checkbox in Settings > Reading

@flixos90
4 years ago

Admin pointer for the new feature

#2 @flixos90
4 years ago

  • Keywords needs-copy-review added

Above you see two screenshots with the relevant UI this feature exposes. It would be great to get feedback for the copy.

adamsilverstein commented on PR #595:


4 years ago
#3

@felixarntz - Overall looks really good! The approach makes sense and a filter based API fits in nicely with how other things worth in core. I do have one concern about the UI and what happens when both checkboxes are checked (below)...

### Testing:
When I tested this code I noticed the new robots tagging by default:

https://i0.wp.com/user-images.githubusercontent.com/2676022/96302276-c2679e80-0fb5-11eb-9df4-d3e36966ed67.png

Then I tried editing the site visibility settings under Settings->Reading:

https://i0.wp.com/user-images.githubusercontent.com/2676022/96303255-56863580-0fb7-11eb-82b5-b062bf07632e.png

When I checked the first checkbox Discourage search engines from indexing this site, the tag changed as expected.

https://i0.wp.com/user-images.githubusercontent.com/2676022/96303333-73bb0400-0fb7-11eb-90e9-89df625d8012.png

When I checked only the second box Discourage search engines from displaying large previews of this site’s media no robots tag was added, meaning the default rule applies and larger media sizes are not shared.

When I checked both options, I still got the <meta name='robots' content='noindex, nofollow' /> tag (same as only the first box checked. I assume this is expected?

If the first box overwrites the second, maybe the second box should be disabled if the first box is checked?

https://i0.wp.com/user-images.githubusercontent.com/2676022/96303640-f80d8700-0fb7-11eb-9a2e-fc9644ed3f74.png

--

I tested using the wp_robots filter to adjust the max preview size and that worked asexpected:
{{{php
add_filter( 'wp_robots',

function( $args) {

$argsmax-image-preview? = "medium";
return $args;

}

);
}}}

felixarntz commented on PR #595:


4 years ago
#4

@adamsilverstein

When I checked both options, I still got the <meta name='robots' content='noindex, nofollow' /> tag (same as only the first box checked. I assume this is expected?

Yes indeed, my thinking is that if search engines are discouraged, it doesn't make sense to tell search engines at the same time that they can use large images.

If the first box overwrites the second, maybe the second box should be disabled if the first box is checked?

That makes sense. Do you think we can just add some simple inline JS to toggle visibility? How is that handled in other cases? We'd also need to account for JS disabled probably, so I was thinking this might add too much complexity?

felixarntz commented on PR #595:


4 years ago
#5

@adamsilverstein In https://github.com/WordPress/wordpress-develop/pull/595/commits/8cdc18dac06b7919e3c245b934a41819a15c515f, I added simple JS logic to disable the checkbox when it's not applicable. This is similar to how the dropdowns for "Homepage" and "Posts page" are handled on the same admin screen - and I agree, conditionally disabling instead of hiding is less visually impactful and avoids layout shifting.

#6 @Clorith
4 years ago

I'm all for a core approach to control robots meta tags, I do have some concerns about the introduction of a secondary UI checkbox for an individual search engine aspect though, as it is a bit vague on what this box truly does, and it feels a bit counter to the core philosophy, although that may just be my interpretation here, so mentioning it for the sake of completeness

From the perspective of a slightly technical individual, two very likely user scenarios come to mind, although there are likely others:

  • I've allowed search engines with one checkbox being unticked, then I tick this other one because I don't want anyone taking my full size images.
  • I've ticked this new box, because I only want thumbnails to show in search engine image searches.

How can we make this less vague, without overburdening users, if we are to retain this new UI element I must admit I've got no good answer right now, but I think this is a discussion that should be had, to avoid any confusion or misunderstandings.

#7 @helen
4 years ago

I don't have a fully formed opinion about the robots API side of things but as for media use in search engines, I am not convinced of the utility of turning this on for the majority of sites. It's making a lot of assumptions about how people use media and technical implementation/spec aside it seems odd to say that this has core importance but not Open Graph tags for social media cards. I can see that it is not the same thing from an output and standards perspective, but with the added UI for all users this is exposing something that is fairly insider and a significant portion of our user base see those as similar items, if not the same.

Another question I have is - why would a user ever want to discourage large media usage for search engines? What would the problem be with large media that they would want to turn that off?

#8 @flixos90
4 years ago

  • Keywords needs-dev-note added
  • Milestone changed from Awaiting Review to 5.7

@helen @Clorith I'm happy to revise whether we'd need this checkbox and (if we do) how to make its purpose more clear. I agree that this is a control for something quite specific that we generally try to avoid in core. Similarly though, there was a strong push to expose a UI control for this as a follow-up on the original announcement post (which didn't initially include that part).

Another question I have is - why would a user ever want to discourage large media usage for search engines? What would the problem be with large media that they would want to turn that off?

I'd say that 99% of users would not want to discourage large media usage. But then there's a fraction (likely mostly larger publishers) that for legal reasons would discourage it (copyright on those images). The announcement post and this post linked from there provide some more context.

If we knew that it's indeed primarily larger publishers that would benefit from the option to discourage, I think it would be easy to argue that a filter alone is sufficient (assuming they have active development resources). But for smaller individually managed sites having the checkbox might be useful. Alternatively, we could avoid the checkbox and instead point to some one-liner plugin for those sites that prefer to discourage.

This ticket was mentioned in PR #702 on WordPress/wordpress-develop by felixarntz.


4 years ago
#9

  • Based on https://github.com/WordPress/wordpress-develop/pull/595, but without the support for max-image-preview:large directive, and without the media_search_engine_visibility setting related to that.
  • This reduces the PR to only focus on introducing the Robots API. The max-image-preview:large piece could then be added separately as a follow-up.

Trac ticket: https://core.trac.wordpress.org/ticket/51511

#10 @flixos90
4 years ago

I've opened a new PR (see above) based on the other one, which only focuses on the Robots API, so that we can review this part first and get it ready, allowing for a more focused conversation.

Both PRs have been refreshed against latest trunk.

#11 @flixos90
4 years ago

I've refreshed the PR for the Robots API to apply cleanly against latest trunk. With the reviews from the previous PR I think this should be good to go soon. Would be great to get some additional eyes on this though!

Once the Robots API PR is committed, we can continue discussing introduction of the max-image-preview:large directive, including a smaller PR to review for that.

Last edited 4 years ago by flixos90 (previous) (diff)

#12 @flixos90
4 years ago

  • Keywords commit added; needs-copy-review removed

@helen @francina Following up on the previous comments, it looks like it would be most beneficial to WordPress users to not include any UI to opt out of allowing search engines to use large image previews, for the following reasons:

  • As long as a site should be surfaced in search results (via the existing checkbox), allowing for large image previews results in a better user experience.
  • The only reason to opt out of large image previews by search engines would be for sites with special copyright requirements, e.g. sites that sell these images.
  • In other words, any UI introduced for this would only benefit a fraction of WordPress sites, which is certainly lower (likely much lower) than 20%.
  • For those sites that would like to opt out of large image previews in search engines, a filter is available to do so (see below).
  • The Yoast SEO plugin, which is used by millions of sites, is following a similar approach, opting in to large image previews by default and allowing to opt out with a filter.

A one line-plugin could be used for sites that would like to opt out of large image previews for search engines:

<?php
remove_filter( 'wp_robots', 'wp_robots_max_image_preview_large' );

I've updated the PR accordingly to remove the option and UI around it.

I've also refreshed the Robots API-only version of it, which has been reviewed multiple times and is good to go. I'll commit that one later today so that we can solely focus on the max-image-preview:large part separately after that.

#13 @flixos90
4 years ago

In 49992:

Robots: Introduce Robots API.

This changeset introduces a filter-based Robots API, providing central control over the robots meta tag.

  • Introduces wp_robots() function which should be called anywhere a robots meta tag should be included.
  • Introduces wp_robots filter which allows adding or modifying directives for the robots meta tag. The wp_robots() function is entirely filter-based, i.e. if no filter is added to wp_robots, no directives will be present, and therefore the entire robots meta tag will be omitted.
  • Introduces the following wp_robots filter functions which replace similar existing functions that were manually rendering a robots meta tag:
    • wp_robots_noindex() replaces noindex(), which has been deprecated.
    • wp_robots_no_robots() replaces wp_no_robots(), which has been deprecated.
    • wp_robots_sensitive_page() replaces wp_sensitive_page_meta(), which has been deprecated. Its rendering of the referrer meta tag has been moved to another new function wp_strict_cross_origin_referrer().

Migration to the new functions is straightforward. For example, a call to add_action( 'wp_head', 'wp_no_robots' ) should be replaced with add_filter( 'wp_robots', 'wp_robots_no_robots' ).

Plugins and themes that render their own robots meta tags are encouraged to switch to rely on the wp_robots filter in order to use the central management layer now provided by WordPress core.

Props adamsilverstein, flixos90, timothyblynjacobs, westonruter.
See #51511.

felixarntz commented on PR #595:


4 years ago
#14

After https://github.com/WordPress/wordpress-develop/commit/176a1f53f04cde92e7297b5214c03beb9e2ba5c8, this PR is now refreshed against latest trunk so that it only includes the bits related to introducing the max-image-preview:large directive.

#15 @joostdevalk
4 years ago

When someone is writing the dev note for this I'm happy to review.

#16 @flixos90
4 years ago

  • Resolution set to fixed
  • Status changed from assigned to closed

In 50078:

Robots: Add max-image-preview:large directive by default.

This changeset introduces a wp_robots_max_image_preview_large() function which is hooked into the wp_robots filter to include the max-image-preview:large directive for all sites which are configured to be indexed by search engines. The directive allows search engines to display large image previews for the site in search results.

Props adamsilverstein, Clorith, flixos90, helen, joostdevalk, tweetythierry, westonruter.
Fixes #51511.

#18 @flixos90
4 years ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

Reopening for dev note.

#19 @flixos90
4 years ago

  • Resolution set to fixed
  • Status changed from reopened to closed

Incorrectly reopened this because of needs-dev-note - that keyword is sufficient though.

#20 @flixos90
4 years ago

@hellofromtonya I added some Robots API QA testing points below:

New default Robots behavior

  • Ensure that for a site where search engines are not discouraged a Robots meta tag with max-image-preview:large directive is present in the frontend.

New API usage

Test the following three cases individually:

  • Activate a custom one-line plugin with remove_all_filters( 'wp_robots' );. Ensure that the frontend now does not display any Robots meta tag.
  • Activate a custom one-line plugin with remove_filter( 'wp_robots', 'wp_robots_max_image_preview_large' );. Ensure that the frontend now does not display any Robots meta tag (since that is the only directive added by default).
  • Activate a custom one-line plugin with add_filter( 'wp_robots', function( $robots ) { $robots['follow'] = true; return $robots; } );. Ensure that the frontend now does not only include the default max-image-preview:large Robots directive, but also includes follow within the same Robots meta tag.

Prevent breakage

  • Ensure that, when the checkbox to discourage the site from being indexed by search engines is enabled, the frontend includes a noindex,nofollow directive in the Robots meta tag like before.
  • Ensure that, within the Customizer preview, the site includes a noindex directive in the Robots meta tag like before.
  • Ensure that the WordPress login page (wp-activate.php) includes a noindex,noarchive directive in the Robots meta tag, as well as a <meta name='referrer' content='strict-origin-when-cross-origin' /> tag, like before.
  • Multisite: Ensure that the site activation page (wp-activate.php), where a newly registered user can confirm their newly created site, includes a noindex,noarchive directive in the Robots meta tag, as well as a <meta name='referrer' content='strict-origin-when-cross-origin' /> tag, like before.

#21 @SergeyBiryukov
4 years ago

In 50371:

Robots: Rename wp_embed_no_robots to wp_robots_noindex_embeds().

This brings the naming in line with wp_robots_noindex_search().

Follow-up to [49992], [50370].

See #51511, #52457.

#23 @flixos90
4 years ago

#20037 was marked as a duplicate.

Note: See TracTickets for help on using tickets.