Make WordPress Core

Opened 4 years ago

Last modified 14 months ago

#51211 new enhancement

Implement a consistent function to obtain the sitemap location

Reported by: greatblakes's profile GreatBlakes Owned by:
Milestone: Awaiting Review Priority: normal
Severity: trivial Version: 5.5.1
Component: Sitemaps Keywords: has-patch
Focuses: Cc:

Description

Would it be possible to implement a function that returns the active XML sitemap location? With so many major plugins choosing to disable the WP core sitemap to continue using their own, it is difficult for others to easily know which is actually being used– especially if those plugins have the option of disabling their sitemap functionality (which means we can't simply detect if the plugin is active to know).

Notable sitemap locations:

  • WordPress Core: wp-sitemap.xml
  • Yoast SEO: sitemap_index.xml
  • The SEO Framework: sitemap.xml

Perhaps the function can be as simple as applying a filter inside that plugins can hook into to modify the location? Something like the following:

function get_sitemap_uri(){
	return (string) apply_filters('sitemap_uri', home_url('/') . 'wp-sitemap.xml');
}

Change History (12)

#1 follow-up: @peterwilsoncc
4 years ago

The function get_sitemap_url() was introduced in WordPress 5.5.1 (refer to the file /wp-includes/sitemaps.php). It's a little more complex than your example as it allows for different types and pagination of sitemaps.

It doesn't have any filters to allow other plugins to override/modify the value but this is something worth considering.

#2 in reply to: ↑ 1 @GreatBlakes
4 years ago

Replying to peterwilsoncc:

Thank you for the clarification– I'm glad to see that addition to 5.5.1! I must have missed that when doing my searching prior to this ticket.

While I do not have experience contributing directly to WordPress core, I'd be happy to provide a Github pull request (linked to this ticket) with a simple filter addition to that function to initiate a discussion about its value. Is that the typical next step for this?

This ticket was mentioned in PR #514 on WordPress/wordpress-develop by chrisblakley.


4 years ago
#3

  • Keywords has-patch added

Added a filter to the returned string within the get_sitemap_url() function so that the sitemap URL can easily be retrieved even when third-party plugins use their own sitemap/location.

Trac ticket: https://core.trac.wordpress.org/ticket/51211

chrisblakley commented on PR #514:


4 years ago
#4

Thanks for the feedback– I implemented your recommendation (and had a few unit test learning experiences 😅) with some minor alterations based on some comparisons to other core functions that use short-circuit filters (particularly the condition itself and the variable name). Obviously the variable name is something that can easily be changed based on your and others' expertise of naming conventions.

This ticket was mentioned in Slack in #core-sitemaps by peterwilsoncc. View the logs.


4 years ago

#6 @pbiron
4 years ago

Hi @GreatBlakes.

I'm trying to understand the use case for this better. You say:

it is difficult for others to easily know which is actually being used

How would adding this filter make that easier? The URL of the sitemap isn't displayed anywhere within the Dashboard.

In most cases, the sitemap index URL is viewable at https://example.com/robots.txt. At least one SEO/Sitemap plugin (Yoast) does not add a reference to the sitemap index to robots.txt but the vast majority do (as does core).

#7 @GreatBlakes
4 years ago

I have a theme that I use for all of my clients' websites and it does a few things with the sitemap file (disregarding the fact that it is probably better suited to plugin functionality):

First thing is that it does add a metabox to the Dashboard where, among other convenient links, is the active sitemap XML file URL so it can easily be copied to add to tools like Google Search Console. The other thing it does is runs automatic checks to make sure that a sitemap is active– this makes it obvious to me or other admins if something got inadvertently changed or deactivated without needing to actively look for it. I'm sure there are other tools for that (or possibly even unnecessary), but it's one of those peace-of-mind type features to me.

Before, I'd just install Yoast on all the sites and I could have both of these features check Yoast's known sitemap file/location specifically, but I'd rather be agnostic to any sitemap plugin or core that my clients want to use.

I suppose I could make a server-side request to the robots.txt file, parse it, then check the response of the sitemap XML file, but my thought here was that if we had a filter, any plugin that is overriding the WordPress core sitemap file could utilize that filter to confirm not only that their sitemap is active, but also their specific filename/location (since there is a wide variety of filenames).

I hope that helps clarify my use case.

From an entirely nit-picky and semantic point-of-view, it could be misleading if a plugin is overriding the WordPress core sitemap XML, but a developer calling get_sitemap_url('index') would get a returned value that technically is not correct. However, I can't say how often (or if) that hypothetical situation would ever happen.

I do want to thank you all for your time and indulging me in this feature request regardless of the outcome.

#9 @swissspidy
14 months ago

#60160 was marked as a duplicate.

#10 @letraceursnork
14 months ago

As a #60160 author, I ought to ask: is it going to be approved and merged to core anytime soon? 3 years have passed, it's 5 minute PR and it will be very convenient functionality
@swissspidy

#11 @swissspidy
14 months ago

This function was intended to return URLs for the various sitemaps generated by cire, not a universal function for getting the ones added by plugins.

If we want to change its purpose and make it the function for getting any sitemap URL, we should also consider whether this should actually be a new function just for the sitemap index file.

We should reach out to plugin authors to see whether any such addition even makes sense for them. What good is this filter if nobody uses it?

#12 @letraceursnork
14 months ago

@swissspidy excuse my english level in advance, text below was translated by ChatGPT (though, the situation is real and I'm struggling with it right now):

Okay, here's a real-life example:

I have WordPress 6.4.2, Dockerized, deployed to instances as a Docker image and orchestrated in Kubernetes.

Our company has an SEO department that wants the robots.txt file to be different from default. They're fine with directly editing robots.txt using any plugin that allows this (specifically, we use Yoast SEO). However, since this file is actually created in the file system and is absent in the repository, all edits are overwritten after redeployment, especially after a new release.

The solution we came up with is: there's a RobotsTxtController that 'constructs' robots.txt from partials. It fetches User-Agent, Allow, and Disallow directives from a specific file (depending on the instance - local, staging, production). Then, at the end, it appends Host (using the get_site_url() function) and Sitemap (using the get_sitemap_url() function) directives. The problem arises precisely because get_sitemap_url() is the native and correct way to get the sitemap link. However, since it's not filtered and its output cannot be overridden, one of two problems occurs:

  1. The plugin generates its own sitemap, which WordPress isn't aware of. The plugin wants to add sitemap's url to robots.txt as a separate directive, but it can't because we want to control robots.txt ourselves. At the point of control, we can't determine if the sitemap has been overwritten/regenerated and, if so, what the correct path is.
  1. The plugin does the above but sets a redirect from /wp-sitemap.xml to its own URL. In this case, search engine bots might (theoretically) say: "We don't want to follow your 301 redirects; the Sitemap directive is incorrect. Bye!"

The solution to both these problems is to add a hook filter for the get_sitemap_url() function. Then, each plugin can independently decide if it wants to use this native engine functionality or not (but generally, I think they would want to).

P.S. Currently, I'm using a makeshift solution - individually checking if a plugin with a specific name is connected. If yes, I provide certain hardcoded URLs in the controller method, which fundamentally isn't correct.

Note: See TracTickets for help on using tickets.