#52099 closed enhancement (fixed)
Sitemaps "Last Modified" (lastmod) tag
Reported by: | junaidbhura | Owned by: | swissspidy |
---|---|---|---|
Milestone: | 6.5 | Priority: | normal |
Severity: | normal | Version: | 5.5 |
Component: | Sitemaps | Keywords: | has-patch has-unit-tests add-to-field-guide |
Focuses: | Cc: |
Description
Sitemaps currently only support the "Location" tag (loc
). This ticket adds support for the "Last Modified" tag.
This is how it works:
1. Posts
This is probably the easiest - it just takes the post_modified_gmt value of the post and creates a lastmod
tag.
2. Taxonomies
It gets the latest modified post in the taxonomy and creates a lastmod
tag based on its last modified value.
3. Users
It gets the latest modified post by a user and creates a lastmod
tag based on its last modified value.
4. Indices
Sitemap indices / indexes work in this way:
- If its a post index - get the last modified date of the last updated post in the post type
- If its a taxonomy index - get the last modified date of the last updated post which is associated with any term in the taxonomy
- If its a user index - get the last modified date of the post type "post" - since all posts are associated with users
Attachments (2)
Change History (35)
This ticket was mentioned in PR #822 on WordPress/wordpress-develop by junaidbhura.
4 years ago
#1
- Keywords has-unit-tests added
github-actions[bot] commented on PR #822:
4 years ago
#2
Hi @junaidbhura! 👋
Thank you for your contribution to WordPress! 💖
It looks like this is your first pull request to wordpress-develop
. Here are a few things to be aware of that may help you out!
No one monitors this repository for new pull requests. Pull requests must be attached to a Trac ticket to be considered for inclusion in WordPress Core. To attach a pull request to a Trac ticket, please include the ticket's full URL in your pull request description.
Pull requests are never merged on GitHub. The WordPress codebase continues to be managed through the SVN repository that this GitHub repository mirrors. Please feel free to open pull requests to work on any contribution you are making.
More information about how GitHub pull requests can be used to contribute to WordPress can be found in this blog post.
Please include automated tests. Including tests in your pull request is one way to help your patch be considered faster. To learn about WordPress' test suites, visit the Automated Testing page in the handbook.
If you have not had a chance, please review the Contribute with Code page in the WordPress Core Handbook.
The Developer Hub also documents the various coding standards that are followed:
- PHP Coding Standards
- CSS Coding Standards
- HTML Coding Standards
- JavaScript Coding Standards
- Accessibility Coding Standards
- Inline Documentation Standards
Thank you,
The WordPress Project
#3
@
4 years ago
Hey @peterwilsoncc do you think you could take a look at this and let me know what you think?
#4
@
4 years ago
I believe that this was intentionally left in plugin territory, at least partially because "last modified" for a post can be much more complicated than this when you consider dynamic content.
If a page's primary function is to embed a YouTube playlist, twitch channel, etc - the page gets new content whenever those things update.
The same is true for pages that pull content from other sources, such as a third party calendaring system, RSS feeds, or even just using the Latest Posts block.
Furthermore, most search engines don't actually consume these other tags.
Here's an excerpt from the blog post announcing the new functionality (https://make.wordpress.org/core/2020/07/22/new-xml-sitemaps-functionality-in-wordpress-5-5/):
"The sitemaps protocol specifies a certain set of supported attributes for sitemap entries. Of those, only the URL (loc) tag is required. All others (e.g. changefreq and priority) are optional tags in the sitemaps protocol and not typically consumed by search engines, which is why WordPress only lists the URL itself. Developers can still add those tags if they really want to."
#5
@
4 years ago
- Version changed from 5.6 to 5.5
Version 0.2.0 of the core sitemaps feature plugin was the last one to include support for lastmod
.
All support for lastmod
was removed in Remove all traces of lastmod, PR 145.
You can check out that PR, which contains links to the issues and slack conversations around that.
In addition to what @MadtownLems mentions, in large
sites lastmod
can be expensive to compute.
#6
@
4 years ago
- Resolution set to wontfix
- Status changed from new to closed
Didn't realise that it was purposely left out - thanks for pointing this out. I'll close this ticket out!
#9
@
3 years ago
- Focuses performance added
- Keywords needs-patch added; has-patch has-unit-tests removed
- Milestone set to Future Release
- Resolution wontfix deleted
- Status changed from closed to reopened
Reopening after talking to @garyillyes regarding #53740.
While Google does not currently use <lastmod>
, other search engines consume it to schedule crawls more effectively, saving resources and decreasing load on sites.
It's true that we removed lastmod
originally to keep things simple and performant, perhaps there is some middle ground where we can add it without much overhead.
For example, adding lastmod for posts is probably trivial, but for other entries and especially the homepage it might not be, and we could consider those cases plugin territory.
Expensive queries could be cached via wp_cache_add
or similar to ensure there's no impact on larger sites.
#11
@
2 years ago
- Focuses performance removed
Removing the performance focus here since this isn't a performance enhancement but rather a new sitemaps feature.
#12
@
2 years ago
Hey @mukesh27 I've just merged the master
branch into the branch. Could you please take a look and share your initial thoughts, and we'll take it from there?
#13
@
22 months ago
Could we please have another look at this patch? Adding lastmod would really make a difference in crawl efficiency for several search engines.
In fact, Google doesn't use it I understand from hearing @garyilyes speak about it on "Search off the record" here, because it's "unreliable". What if we made it reliable by actually doing it right? :)
For now, the pull above seems like a very good first step.
#14
@
22 months ago
- Keywords has-patch added; needs-patch removed
- Milestone changed from Future Release to 6.3
#15
@
22 months ago
Bing is soon rolling out a change which will make more effective use of lastmod
(more info: https://blogs.bing.com/webmaster/february-2023/The-Importance-of-Setting-the-lastmod-Tag-in-Your-Sitemap)
With that in mind, adding this info to our sitemaps will reduce server loads and improve the project's sustainability.
I'm sure other search engines will follow Bing - but even if they don't, Google is not the only search engine out there... There are plenty of other crawlers and they do use it.
#16
@
22 months ago
Fabrice from Bing: thanks for considering this sitemap change. This change will really help on a massive scale to optimize the crawling of all WordPress sites leading to cost savings and an improvement in the freshness and completeness of indexed content. Through upcoming testimonials and sharing of data, they aim to demonstrate the benefits and encourage other search engines to leverage lastmod too. It's important to keep in mind that sitemaps are a daily task, therefore, the adoption of IndexNow by WordPress (https://core.trac.wordpress.org/ticket/52900) is highly recommended to further improve crawl efficiency and ensure content freshness. I update the IndexNow feature request as the code is already prepared and this is truly a game changer.
#17
@
18 months ago
- Keywords needs-refresh added
Patch needs a refresh, at least files with Unit-tests have changed names.
We are in 9 days to Beta 1, so, there is a little time to have this enhancement in 6.3.
#18
@
18 months ago
- Milestone changed from 6.3 to Future Release
The PR doesn't pass the unit tests. Also ideally this should be reviewed by the team that implemented the site maps code, the patch is rather large and somewhat complex.
#19
@
17 months ago
While Google does not currently use <lastmod>
This seems to be not true anymore:
https://developers.google.com/search/blog/2023/06/sitemaps-lastmod-ping
#20
follow-up:
↓ 21
@
17 months ago
I'm happy to update the tests, but the last time I did that, nobody reviewed it or added any comments.
Who should this be assigned to for review once it's updated?
#21
in reply to:
↑ 20
@
17 months ago
Replying to junaidbhura:
I'm happy to update the tests, but the last time I did that, nobody reviewed it or added any comments.
Who should this be assigned to for review once it's updated?
The two component maintainers are already in this ticket: @pbiron and @swissspidy
According to @azaozz this should be ideally reviewed by one of them:
Also ideally this should be reviewed by the team that implemented the site maps code, the patch is rather large and somewhat complex.
#22
@
17 months ago
- Keywords needs-unit-tests added
I have missed this PR, but I am happy to review anything that's flagged to me.
Some key takeaways from https://developers.google.com/search/blog/2023/06/sitemaps-lastmod-ping (emphasis mine):
For instance, some site software may not be able to easily tell the last modification date of the homepage or a category page because it just aggregates the other pages on the site. In these cases it's fine to leave out
lastmod
for those pages.
And when we say "last modification", we actually mean "last significant modification". If your CMS changed an insignificant piece of text in the sidebar or footer, you don't have to update the
lastmod
value for that page. However if you changed the primary text, added or changed structured data, or updated some links, do update thelastmod
value.
This aligns well with my previous comment about only adding lastmod
where it makes sense and where we can easily get this data. Running WP_Query
in loops is rather expensive. It's one of the reasons we originally removed lastmod support, see https://github.com/GoogleChromeLabs/wp-sitemaps/pull/145
In short: keep it simple for now.
Let's start by adding lastmod for individual posts. That should resolve this.
#23
@
17 months ago
Might I suggest being slightly more ambitious? Getting last mod data for taxonomies is relatively hard if you want to query it without storing anything anywhere, but there are better ways around that (like, storing the lastmod date for a taxonomy term when something is added to that term).
From an SEO perspective, getting lastmod dates on everything is truly getting more important with Google's latest announcements and Bing's improvements. It really does make web crawling a lot more efficient.
#24
@
17 months ago
Getting last mod data for taxonomies is relatively hard if you want to query it without storing anything anywhere, but there are better ways around that (like, storing the lastmod date for a taxonomy term when something is added to that term).
That's why I suggest starting with the straightforward parts while figuring out how to best store lastmod for terms and individual sitemap pages.
This ticket was mentioned in PR #4901 on WordPress/wordpress-develop by @swissspidy.
17 months ago
#25
- Keywords has-unit-tests added; needs-refresh needs-unit-tests removed
This resolves the most critical aspect of that Trac ticket for "Last modified".
What it does:
- Adds
lastmod
to all individual post objects (of any post type) in the sitemap.- This is trivial to add because this information is directly available.
- Adds
lastmod
to the homepage sitemap entry if the homepage is set to display the latest posts.- This is an approximation that works by fetching the most recent posts (based on their _publish date_) and then sorting them in PHP by their _modified date_.
As per my comment, the latter is mostly nice to have but not critical. If we think this approximation is not good enough, we can omit it for now.
What it does not:
- No
lastmod
is added for the individual sitemap pages in the sitemap index - No
lastmod
is added for term archives or user archives
Those are difficult to add without any proper mechanism to determine and cache last modified date. As per my comment, this should not be an issue.
Trac ticket: https://core.trac.wordpress.org/ticket/52099
@swissspidy commented on PR #4901:
16 months ago
#26
@pbiron Would love to get your review on this one
@swissspidy commented on PR #4901:
14 months ago
#28
@joemcgill Since you were involved with sitemaps originally, I would appreciate your eyes on this PR.
tl;dr is that lastmod is useful for search engines, so when we can easily expose it (such as for posts and the homepage), we should, but it's also no big deal if it's missing (e.g. for term archives)
@swissspidy commented on PR #4901:
14 months ago
#30
Thanks @joemcgill! That's how I remember it too. Now the guidance has slightly changed, and providing lastmod where easily possible is now preferrable.
This PR will close the existing ticket, but a new one can be opened for lastmod elsewhere and whatever is required for that.
@swissspidy commented on PR #4901:
14 months ago
#33
Committed in https://core.trac.wordpress.org/changeset/56985
Trac ticket: https://core.trac.wordpress.org/ticket/52099