Make WordPress Core

Opened 3 years ago

Closed 6 months ago

Last modified 6 months ago

#52099 closed enhancement (fixed)

Sitemaps "Last Modified" (lastmod) tag

Reported by: junaidbhura's profile junaidbhura Owned by: swissspidy's profile swissspidy
Milestone: 6.5 Priority: normal
Severity: normal Version: 5.5
Component: Sitemaps Keywords: has-patch has-unit-tests add-to-field-guide
Focuses: Cc:

Description

Sitemaps currently only support the "Location" tag (loc). This ticket adds support for the "Last Modified" tag.

This is how it works:

1. Posts

This is probably the easiest - it just takes the post_modified_gmt value of the post and creates a lastmod tag.

2. Taxonomies

It gets the latest modified post in the taxonomy and creates a lastmod tag based on its last modified value.

3. Users

It gets the latest modified post by a user and creates a lastmod tag based on its last modified value.

4. Indices

Sitemap indices / indexes work in this way:

  1. If its a post index - get the last modified date of the last updated post in the post type
  2. If its a taxonomy index - get the last modified date of the last updated post which is associated with any term in the taxonomy
  3. If its a user index - get the last modified date of the post type "post" - since all posts are associated with users

Attachments (2)

52099.diff (12.1 KB) - added by junaidbhura 3 years ago.
52099.2.diff (12.1 KB) - added by junaidbhura 3 years ago.

Download all attachments as: .zip

Change History (35)

@junaidbhura
3 years ago

This ticket was mentioned in PR #822 on WordPress/wordpress-develop by junaidbhura.


3 years ago
#1

  • Keywords has-unit-tests added

github-actions[bot] commented on PR #822:


3 years ago
#2

Hi @junaidbhura! 👋

Thank you for your contribution to WordPress! 💖

It looks like this is your first pull request to wordpress-develop. Here are a few things to be aware of that may help you out!

No one monitors this repository for new pull requests. Pull requests must be attached to a Trac ticket to be considered for inclusion in WordPress Core. To attach a pull request to a Trac ticket, please include the ticket's full URL in your pull request description.

Pull requests are never merged on GitHub. The WordPress codebase continues to be managed through the SVN repository that this GitHub repository mirrors. Please feel free to open pull requests to work on any contribution you are making.

More information about how GitHub pull requests can be used to contribute to WordPress can be found in this blog post.

Please include automated tests. Including tests in your pull request is one way to help your patch be considered faster. To learn about WordPress' test suites, visit the Automated Testing page in the handbook.

If you have not had a chance, please review the Contribute with Code page in the WordPress Core Handbook.

The Developer Hub also documents the various coding standards that are followed:

Thank you,
The WordPress Project

#3 @junaidbhura
3 years ago

Hey @peterwilsoncc do you think you could take a look at this and let me know what you think?

https://github.com/WordPress/wordpress-develop/pull/822

@junaidbhura
3 years ago

#4 @MadtownLems
3 years ago

I believe that this was intentionally left in plugin territory, at least partially because "last modified" for a post can be much more complicated than this when you consider dynamic content.

If a page's primary function is to embed a YouTube playlist, twitch channel, etc - the page gets new content whenever those things update.

The same is true for pages that pull content from other sources, such as a third party calendaring system, RSS feeds, or even just using the Latest Posts block.

Furthermore, most search engines don't actually consume these other tags.

Here's an excerpt from the blog post announcing the new functionality (https://make.wordpress.org/core/2020/07/22/new-xml-sitemaps-functionality-in-wordpress-5-5/):

"The sitemaps protocol specifies a certain set of supported attributes for sitemap entries. Of those, only the URL (loc) tag is required. All others (e.g. changefreq and priority) are optional tags in the sitemaps protocol and not typically consumed by search engines, which is why WordPress only lists the URL itself. Developers can still add those tags if they really want to."

#5 @pbiron
3 years ago

  • Version changed from 5.6 to 5.5

Version 0.2.0 of the core sitemaps feature plugin was the last one to include support for lastmod.

All support for lastmod was removed in Remove all traces of lastmod, PR 145.

You can check out that PR, which contains links to the issues and slack conversations around that.

In addition to what @MadtownLems mentions, in large sites lastmod can be expensive to compute.

#6 @junaidbhura
3 years ago

  • Resolution set to wontfix
  • Status changed from new to closed

Didn't realise that it was purposely left out - thanks for pointing this out. I'll close this ticket out!

#7 @desrosj
3 years ago

  • Milestone Awaiting Review deleted

#8 @ocean90
3 years ago

#53740 was marked as a duplicate.

#9 @swissspidy
3 years ago

  • Focuses performance added
  • Keywords needs-patch added; has-patch has-unit-tests removed
  • Milestone set to Future Release
  • Resolution wontfix deleted
  • Status changed from closed to reopened

Reopening after talking to @garyillyes regarding #53740.

While Google does not currently use <lastmod>, other search engines consume it to schedule crawls more effectively, saving resources and decreasing load on sites.

It's true that we removed lastmod originally to keep things simple and performant, perhaps there is some middle ground where we can add it without much overhead.

For example, adding lastmod for posts is probably trivial, but for other entries and especially the homepage it might not be, and we could consider those cases plugin territory.

Expensive queries could be cached via wp_cache_add or similar to ensure there's no impact on larger sites.

#10 @mukesh27
22 months ago

Hi @junaidbhura, do you want to create a new pull request?

#11 @flixos90
22 months ago

  • Focuses performance removed

Removing the performance focus here since this isn't a performance enhancement but rather a new sitemaps feature.

#12 @junaidbhura
21 months ago

Hey @mukesh27 I've just merged the master branch into the branch. Could you please take a look and share your initial thoughts, and we'll take it from there?

https://github.com/WordPress/wordpress-develop/pull/822

#13 @joostdevalk
15 months ago

Could we please have another look at this patch? Adding lastmod would really make a difference in crawl efficiency for several search engines.

In fact, Google doesn't use it I understand from hearing @garyilyes speak about it on "Search off the record" here, because it's "unreliable". What if we made it reliable by actually doing it right? :)

For now, the pull above seems like a very good first step.

#14 @SergeyBiryukov
15 months ago

  • Keywords has-patch added; needs-patch removed
  • Milestone changed from Future Release to 6.3

#15 @aristath
15 months ago

Bing is soon rolling out a change which will make more effective use of lastmod (more info: https://blogs.bing.com/webmaster/february-2023/The-Importance-of-Setting-the-lastmod-Tag-in-Your-Sitemap)
With that in mind, adding this info to our sitemaps will reduce server loads and improve the project's sustainability.
I'm sure other search engines will follow Bing - but even if they don't, Google is not the only search engine out there... There are plenty of other crawlers and they do use it.

#16 @fabricecanel
15 months ago

Fabrice from Bing: thanks for considering this sitemap change. This change will really help on a massive scale to optimize the crawling of all WordPress sites leading to cost savings and an improvement in the freshness and completeness of indexed content. Through upcoming testimonials and sharing of data, they aim to demonstrate the benefits and encourage other search engines to leverage lastmod too. It's important to keep in mind that sitemaps are a daily task, therefore, the adoption of IndexNow by WordPress (https://core.trac.wordpress.org/ticket/52900) is highly recommended to further improve crawl efficiency and ensure content freshness. I update the IndexNow feature request as the code is already prepared and this is truly a game changer.

#17 @oglekler
10 months ago

  • Keywords needs-refresh added

Patch needs a refresh, at least files with Unit-tests have changed names.
We are in 9 days to Beta 1, so, there is a little time to have this enhancement in 6.3.

#18 @azaozz
10 months ago

  • Milestone changed from 6.3 to Future Release

The PR doesn't pass the unit tests and beta1 is tomorrow. Also ideally this should be reviewed by the team that implemented the site maps code, the patch is rather large and somewhat complex.

Last edited 10 months ago by azaozz (previous) (diff)

#19 @zodiac1978
10 months ago

While Google does not currently use <lastmod>

This seems to be not true anymore:
https://developers.google.com/search/blog/2023/06/sitemaps-lastmod-ping

#20 follow-up: @junaidbhura
10 months ago

I'm happy to update the tests, but the last time I did that, nobody reviewed it or added any comments.

Who should this be assigned to for review once it's updated?

#21 in reply to: ↑ 20 @zodiac1978
10 months ago

Replying to junaidbhura:

I'm happy to update the tests, but the last time I did that, nobody reviewed it or added any comments.

Who should this be assigned to for review once it's updated?

The two component maintainers are already in this ticket: @pbiron and @swissspidy

According to @azaozz this should be ideally reviewed by one of them:

Also ideally this should be reviewed by the team that implemented the site maps code, the patch is rather large and somewhat complex.

#22 @swissspidy
10 months ago

  • Keywords needs-unit-tests added

I have missed this PR, but I am happy to review anything that's flagged to me.

Some key takeaways from https://developers.google.com/search/blog/2023/06/sitemaps-lastmod-ping (emphasis mine):

For instance, some site software may not be able to easily tell the last modification date of the homepage or a category page because it just aggregates the other pages on the site. In these cases it's fine to leave out lastmod for those pages.

And when we say "last modification", we actually mean "last significant modification". If your CMS changed an insignificant piece of text in the sidebar or footer, you don't have to update the lastmod value for that page. However if you changed the primary text, added or changed structured data, or updated some links, do update the lastmod value.

This aligns well with my previous comment about only adding lastmod where it makes sense and where we can easily get this data. Running WP_Query in loops is rather expensive. It's one of the reasons we originally removed lastmod support, see https://github.com/GoogleChromeLabs/wp-sitemaps/pull/145

In short: keep it simple for now.

Let's start by adding lastmod for individual posts. That should resolve this.

#23 @joostdevalk
10 months ago

Might I suggest being slightly more ambitious? Getting last mod data for taxonomies is relatively hard if you want to query it without storing anything anywhere, but there are better ways around that (like, storing the lastmod date for a taxonomy term when something is added to that term).

From an SEO perspective, getting lastmod dates on everything is truly getting more important with Google's latest announcements and Bing's improvements. It really does make web crawling a lot more efficient.

#24 @swissspidy
9 months ago

Getting last mod data for taxonomies is relatively hard if you want to query it without storing anything anywhere, but there are better ways around that (like, storing the lastmod date for a taxonomy term when something is added to that term).

That's why I suggest starting with the straightforward parts while figuring out how to best store lastmod for terms and individual sitemap pages.

This ticket was mentioned in PR #4901 on WordPress/wordpress-develop by @swissspidy.


9 months ago
#25

  • Keywords has-unit-tests added; needs-refresh needs-unit-tests removed

This resolves the most critical aspect of that Trac ticket for "Last modified".

What it does:

  • Adds lastmod to all individual post objects (of any post type) in the sitemap.
    • This is trivial to add because this information is directly available.
  • Adds lastmod to the homepage sitemap entry if the homepage is set to display the latest posts.
    • This is an approximation that works by fetching the most recent posts (based on their _publish date_) and then sorting them in PHP by their _modified date_.

As per my comment, the latter is mostly nice to have but not critical. If we think this approximation is not good enough, we can omit it for now.

What it does not:

  • No lastmod is added for the individual sitemap pages in the sitemap index
  • No lastmod is added for term archives or user archives

Those are difficult to add without any proper mechanism to determine and cache last modified date. As per my comment, this should not be an issue.

Trac ticket: https://core.trac.wordpress.org/ticket/52099

@swissspidy commented on PR #4901:


9 months ago
#26

@pbiron Would love to get your review on this one

#27 @swissspidy
7 months ago

  • Milestone changed from Future Release to 6.5

@swissspidy commented on PR #4901:


6 months ago
#28

@joemcgill Since you were involved with sitemaps originally, I would appreciate your eyes on this PR.

tl;dr is that lastmod is useful for search engines, so when we can easily expose it (such as for posts and the homepage), we should, but it's also no big deal if it's missing (e.g. for term archives)

#29 @swissspidy
6 months ago

  • Owner set to swissspidy
  • Status changed from reopened to assigned

@swissspidy commented on PR #4901:


6 months ago
#30

Thanks @joemcgill! That's how I remember it too. Now the guidance has slightly changed, and providing lastmod where easily possible is now preferrable.

This PR will close the existing ticket, but a new one can be opened for lastmod elsewhere and whatever is required for that.

#31 @swissspidy
6 months ago

  • Resolution set to fixed
  • Status changed from assigned to closed

In 56985:

Sitemaps: add lastmod for individual posts and the homepage.

When the XML sitemaps feature was originally introduced, the lastmod field was omitted because guidance at the time indicated it was less important for search engines, plus for some entities it was computationally expensive to add. Now that the guidance has slightly changed, we are revisiting this and adding lastmod where easily possible.

  • Adds lastmod to all individual post objects (of any post type) in the sitemap
  • Adds lastmod to the homepage sitemap entry if the homepage is set to display the latest posts.

No lastmod is added for the individual sitemap pages in the sitemap index, nor for term archives or user archives. Those enhancements require additional changes, such as storing the modified date for a taxonomy term when something is added to that term. They can be revisited in separate follow-up tickets.

Props swissspidy, joemcgill.
Fixes #52099

#32 @swissspidy
6 months ago

  • Keywords add-to-field-guide added
Note: See TracTickets for help on using tickets.