WordPress.org

Make WordPress Core

Opened 3 months ago

Last modified 2 days ago

#51511 assigned feature request

Introduce Robots API and Media Search Engine Visibility setting

Reported by: flixos90 Owned by: flixos90
Milestone: 5.7 Priority: normal
Severity: normal Version:
Component: General Keywords: has-patch has-unit-tests needs-dev-note commit
Focuses: Cc:

Description

As proposed in the "Enhancing image preview: core proposal" announcement post, this ticket aims at introducing the following:

  • A simple filter-based Robots API to centrally manage content of the robots meta tag injected into the page.
  • A setting to toggle whether search engines are allowed to display large media from the site.
  • A max-image-preview:large robots directive which will be injected into the robots meta tag based on the new setting.

There are a couple of extra requirements:

  • The new Robots API should by default not include any directives (i.e. no robots meta tag would be printed). All WP core directives should be injected via their own filter callback functions.
  • The default behavior of which directives core injects should mirror core's behavior of today (with the only exception being the new conditional max-image-preview:large directive). More technically speaking, today's wp_head action callbacks to render robots meta tags should become wp_robots filter callbacks instead.
  • The setting that toggles the new directive should be exposed as a checkbox in Settings > Reading, together with the existing checkbox to control search engine visibility.
  • The setting should be enabled by default. However, in addition to relying on the setting, the max-image-preview:large directive should only be injected if the site is also allowing search engine indexing. More technically speaking, blog_public takes precedence over the new setting.
  • An admin pointer should inform users about the new setting, its default and what this means for WordPress behavior.

Side note: The filter-based Robots API this ticket aims to introduce should furthermore address #20037, which also requests robots customization, just a bit less comprehensively.

Attachments (2)

screenshot-setting-checkbox.png (100.4 KB) - added by flixos90 3 months ago.
The new checkbox in Settings > Reading
screenshot-admin-pointer.png (224.1 KB) - added by flixos90 3 months ago.
Admin pointer for the new feature

Download all attachments as: .zip

Change History (16)

This ticket was mentioned in PR #595 on WordPress/wordpress-develop by felixarntz.


3 months ago

  • Keywords has-patch has-unit-tests added; needs-patch needs-unit-tests removed
  • Introduces basic Robots API controlled via wp_robots filter.
  • Adds wp_robots filter callback functions mirroring existing core behavior.
  • Replaces wp_head action hook callbacks related to "robots" meta tag with wp_robots filter hook callbacks.
  • Deprecates wp_head action hook callback functions related to "robots" meta tag, referencing the respective filter hook callback functions as replacement.
  • Introduces an additional filter function to unconditionally inject max-image-preview:large robots directive, and another filter function to inject it based on whether both blog_public and media_search_engine_visibility settings are truthy.
  • Adds infrastructure (defaults, upgrade, sanitization) for new media_search_engine_visibility setting, with the setting having a default value of 1 (enabled).
  • Displays a checkbox to control the new setting under _Settings > Reading_. Similarly to the related blog_public checkbox, it displays the inverse value, with the checkbox label being phrased as "Discourage search engines ...".
  • Displays an admin pointer to the admin dashboard which informs users about the new setting and that it can be tweaked under _Settings > Reading_.

Trac ticket: https://core.trac.wordpress.org/ticket/51511

@flixos90
3 months ago

The new checkbox in Settings > Reading

@flixos90
3 months ago

Admin pointer for the new feature

#2 @flixos90
3 months ago

  • Keywords needs-copy-review added

Above you see two screenshots with the relevant UI this feature exposes. It would be great to get feedback for the copy.

#3 @prbot
3 months ago

adamsilverstein commented on PR #595:

@felixarntz - Overall looks really good! The approach makes sense and a filter based API fits in nicely with how other things worth in core. I do have one concern about the UI and what happens when both checkboxes are checked (below)...

### Testing:
When I tested this code I noticed the new robots tagging by default:

https://i0.wp.com/user-images.githubusercontent.com/2676022/96302276-c2679e80-0fb5-11eb-9df4-d3e36966ed67.png

Then I tried editing the site visibility settings under Settings->Reading:

https://i0.wp.com/user-images.githubusercontent.com/2676022/96303255-56863580-0fb7-11eb-82b5-b062bf07632e.png

When I checked the first checkbox Discourage search engines from indexing this site, the tag changed as expected.

https://i0.wp.com/user-images.githubusercontent.com/2676022/96303333-73bb0400-0fb7-11eb-90e9-89df625d8012.png

When I checked only the second box Discourage search engines from displaying large previews of this site’s media no robots tag was added, meaning the default rule applies and larger media sizes are not shared.

When I checked both options, I still got the <meta name='robots' content='noindex, nofollow' /> tag (same as only the first box checked. I assume this is expected?

If the first box overwrites the second, maybe the second box should be disabled if the first box is checked?

https://i0.wp.com/user-images.githubusercontent.com/2676022/96303640-f80d8700-0fb7-11eb-9a2e-fc9644ed3f74.png

--

I tested using the wp_robots filter to adjust the max preview size and that worked asexpected:
{{{php
add_filter( 'wp_robots',

function( $args) {

$argsmax-image-preview? = "medium";
return $args;

}

);
}}}

#4 @prbot
3 months ago

felixarntz commented on PR #595:

@adamsilverstein

When I checked both options, I still got the <meta name='robots' content='noindex, nofollow' /> tag (same as only the first box checked. I assume this is expected?

Yes indeed, my thinking is that if search engines are discouraged, it doesn't make sense to tell search engines at the same time that they can use large images.

If the first box overwrites the second, maybe the second box should be disabled if the first box is checked?

That makes sense. Do you think we can just add some simple inline JS to toggle visibility? How is that handled in other cases? We'd also need to account for JS disabled probably, so I was thinking this might add too much complexity?

#5 @prbot
3 months ago

felixarntz commented on PR #595:

@adamsilverstein In https://github.com/WordPress/wordpress-develop/pull/595/commits/8cdc18dac06b7919e3c245b934a41819a15c515f, I added simple JS logic to disable the checkbox when it's not applicable. This is similar to how the dropdowns for "Homepage" and "Posts page" are handled on the same admin screen - and I agree, conditionally disabling instead of hiding is less visually impactful and avoids layout shifting.

#6 @Clorith
3 months ago

I'm all for a core approach to control robots meta tags, I do have some concerns about the introduction of a secondary UI checkbox for an individual search engine aspect though, as it is a bit vague on what this box truly does, and it feels a bit counter to the core philosophy, although that may just be my interpretation here, so mentioning it for the sake of completeness

From the perspective of a slightly technical individual, two very likely user scenarios come to mind, although there are likely others:

  • I've allowed search engines with one checkbox being unticked, then I tick this other one because I don't want anyone taking my full size images.
  • I've ticked this new box, because I only want thumbnails to show in search engine image searches.

How can we make this less vague, without overburdening users, if we are to retain this new UI element I must admit I've got no good answer right now, but I think this is a discussion that should be had, to avoid any confusion or misunderstandings.

#7 @helen
3 months ago

I don't have a fully formed opinion about the robots API side of things but as for media use in search engines, I am not convinced of the utility of turning this on for the majority of sites. It's making a lot of assumptions about how people use media and technical implementation/spec aside it seems odd to say that this has core importance but not Open Graph tags for social media cards. I can see that it is not the same thing from an output and standards perspective, but with the added UI for all users this is exposing something that is fairly insider and a significant portion of our user base see those as similar items, if not the same.

Another question I have is - why would a user ever want to discourage large media usage for search engines? What would the problem be with large media that they would want to turn that off?

#8 @flixos90
3 months ago

  • Keywords needs-dev-note added
  • Milestone changed from Awaiting Review to 5.7

@helen @Clorith I'm happy to revise whether we'd need this checkbox and (if we do) how to make its purpose more clear. I agree that this is a control for something quite specific that we generally try to avoid in core. Similarly though, there was a strong push to expose a UI control for this as a follow-up on the original announcement post (which didn't initially include that part).

Another question I have is - why would a user ever want to discourage large media usage for search engines? What would the problem be with large media that they would want to turn that off?

I'd say that 99% of users would not want to discourage large media usage. But then there's a fraction (likely mostly larger publishers) that for legal reasons would discourage it (copyright on those images). The announcement post and this post linked from there provide some more context.

If we knew that it's indeed primarily larger publishers that would benefit from the option to discourage, I think it would be easy to argue that a filter alone is sufficient (assuming they have active development resources). But for smaller individually managed sites having the checkbox might be useful. Alternatively, we could avoid the checkbox and instead point to some one-liner plugin for those sites that prefer to discourage.

This ticket was mentioned in PR #702 on WordPress/wordpress-develop by felixarntz.


3 months ago

  • Based on https://github.com/WordPress/wordpress-develop/pull/595, but without the support for max-image-preview:large directive, and without the media_search_engine_visibility setting related to that.
  • This reduces the PR to only focus on introducing the Robots API. The max-image-preview:large piece could then be added separately as a follow-up.

Trac ticket: https://core.trac.wordpress.org/ticket/51511

#10 @flixos90
3 months ago

I've opened a new PR (see above) based on the other one, which only focuses on the Robots API, so that we can review this part first and get it ready, allowing for a more focused conversation.

Both PRs have been refreshed against latest trunk.

#11 @flixos90
6 weeks ago

I've refreshed the PR for the Robots API to apply cleanly against latest trunk. With the reviews from the previous PR I think this should be good to go soon. Would be great to get some additional eyes on this though!

Once the Robots API PR is committed, we can continue discussing introduction of the max-image-preview:large directive, including a smaller PR to review for that.

Last edited 6 weeks ago by flixos90 (previous) (diff)

#12 @flixos90
2 days ago

  • Keywords commit added; needs-copy-review removed

@helen @francina Following up on the previous comments, it looks like it would be most beneficial to WordPress users to not include any UI to opt out of allowing search engines to use large image previews, for the following reasons:

  • As long as a site should be surfaced in search results (via the existing checkbox), allowing for large image previews results in a better user experience.
  • The only reason to opt out of large image previews by search engines would be for sites with special copyright requirements, e.g. sites that sell these images.
  • In other words, any UI introduced for this would only benefit a fraction of WordPress sites, which is certainly lower (likely much lower) than 20%.
  • For those sites that would like to opt out of large image previews in search engines, a filter is available to do so (see below).
  • The Yoast SEO plugin, which is used by millions of sites, is following a similar approach, opting in to large image previews by default and allowing to opt out with a filter.

A one line-plugin could be used for sites that would like to opt out of large image previews for search engines:

<?php
remove_filter( 'wp_robots', 'wp_robots_max_image_preview_large' );

I've updated the PR accordingly to remove the option and UI around it.

I've also refreshed the Robots API-only version of it, which has been reviewed multiple times and is good to go. I'll commit that one later today so that we can solely focus on the max-image-preview:large part separately after that.

#13 @flixos90
2 days ago

In 49992:

Robots: Introduce Robots API.

This changeset introduces a filter-based Robots API, providing central control over the robots meta tag.

  • Introduces wp_robots() function which should be called anywhere a robots meta tag should be included.
  • Introduces wp_robots filter which allows adding or modifying directives for the robots meta tag. The wp_robots() function is entirely filter-based, i.e. if no filter is added to wp_robots, no directives will be present, and therefore the entire robots meta tag will be omitted.
  • Introduces the following wp_robots filter functions which replace similar existing functions that were manually rendering a robots meta tag:
    • wp_robots_noindex() replaces noindex(), which has been deprecated.
    • wp_robots_no_robots() replaces wp_no_robots(), which has been deprecated.
    • wp_robots_sensitive_page() replaces wp_sensitive_page_meta(), which has been deprecated. Its rendering of the referrer meta tag has been moved to another new function wp_strict_cross_origin_referrer().

Migration to the new functions is straightforward. For example, a call to add_action( 'wp_head', 'wp_no_robots' ) should be replaced with add_filter( 'wp_robots', 'wp_robots_no_robots' ).

Plugins and themes that render their own robots meta tags are encouraged to switch to rely on the wp_robots filter in order to use the central management layer now provided by WordPress core.

Props adamsilverstein, flixos90, timothyblynjacobs, westonruter.
See #51511.

#14 @prbot
2 days ago

felixarntz commented on PR #595:

After https://github.com/WordPress/wordpress-develop/commit/176a1f53f04cde92e7297b5214c03beb9e2ba5c8, this PR is now refreshed against latest trunk so that it only includes the bits related to introducing the max-image-preview:large directive.

Note: See TracTickets for help on using tickets.