Opened 3 years ago
Last modified 3 months ago
#51860 new enhancement
wp-sitemap includes empty post entries returned by filters
Reported by: |
|
Owned by: | |
---|---|---|---|
Milestone: | Awaiting Review | Priority: | normal |
Severity: | normal | Version: | 5.5 |
Component: | Sitemaps | Keywords: | close |
Focuses: | Cc: |
Description
The ./wp-includes/sitemaps/providers/class-wp-sitemaps-posts.php class applies the 'wp_sitemaps_posts_show_on_front_entry' and 'wp_sitemaps_posts_entry' filter, which may enrich the $sitemap_entry, or return false, or return an empty array (as another way to exclude a post from the sitemap, for example). This is preferable if the post needs to be excluded based on some post meta, etc. Currently, the empty $sitemap_entry is added to the wp-sitemap.xml, which is incorrect. The empty $sitemap_entry should be skipped.
Attachments (2)
Change History (8)
#1
follow-up:
↓ 3
@
3 years ago
- Type changed from defect (bug) to enhancement
return an empty array (as another way to exclude a post from the sitemap, for example)
Just for reference, the filter was actually deliberately implemented so that this is _not_ possible as it would mess with pagination. So it was never recommended to return an empty array. Instead, the WP_Query args should be filtered to exclude posts.
#2
@
3 years ago
To filter posts by complex meta, you would have to run a query to get all posts, loop through each and figure out which ones need to be excluded, and then add the excluded post IDs to the 'wp_sitemaps_posts_query_args' query. Running double queries does not sound optimal to me.
#3
in reply to:
↑ 1
@
3 years ago
Replying to swissspidy:
return an empty array (as another way to exclude a post from the sitemap, for example)
Just for reference, the filter was actually deliberately implemented so that this is _not_ possible as it would mess with pagination. So it was never recommended to return an empty array. Instead, the WP_Query args should be filtered to exclude posts.
The 'wp_sitemaps_posts_query_args' filter is applied when counting the maximum number of pages, and for each page, so unless a filter hook is coded properly, it has the potential to affect performance (having to be applied for the counter and each page).
Using a static cache, I was able to create a filter that, I believe, will avoid affecting performance too much. I'll leave this example here, in case anyone else needs to exclude many posts, terms, or users from the query. Note that the excluded post IDs are merged from the cache, which allows other filters to exclude additional post IDs before/after this filter.
public function wp_sitemaps_posts_query_args( $args, $post_type ) { static $local_cache = array(); if ( ! isset( $local_cache[ $post_type ] ) ) { $local_cache[ $post_type ] = array(); $query = new WP_Query( array_merge( $args, array( 'fields' => 'ids', // Return an array of post ids. 'no_found_rows' => true, // Skip counting total rows found. ) ) ); foreach ( $query->posts as $post_id ) { if ( is_post_excluded_example_function( $post_id ) ) { $local_cache[ $post_type ][] = $post_id; } } } if ( ! empty( $local_cache[ $post_type ] ) ) { $args[ 'post__not_in' ] = empty( $args[ 'post__not_in' ] ) ? $local_cache[ $post_type ] : array_merge( $args[ 'post__not_in' ], $local_cache[ $post_type ] ); } return $args; }
#4
follow-up:
↓ 5
@
3 years ago
I faced the same issue; discussed at Slack https://wordpress.slack.com/archives/CTKTGNJJW/p1604995479019700.
A proper workaround to this would be to maintain an active cache with page/term/author IDs you wish to exclude. Then, inject those into the query as exclusions. However, this is not an easy feat since WordPress is quite dynamic.
I believe Yoast SEO provides such a system with "Indexables," something that's challenging to incorporate and maintain.
#5
in reply to:
↑ 4
@
3 years ago
Replying to Cybr:
I faced the same issue; discussed at Slack https://wordpress.slack.com/archives/CTKTGNJJW/p1604995479019700.
A proper workaround to this would be to maintain an active cache with page/term/author IDs you wish to exclude. Then, inject those into the query as exclusions. However, this is not an easy feat since WordPress is quite dynamic.
I believe Yoast SEO provides such a system with "Indexables," something that's challenging to incorporate and maintain.
I see you faced the same issue I did. I believe I found a working solution by using a static cache variable to create the exclusion list once per page load. Since XML sitemaps are not retrieved as often as a PHP webpage, this is probably fine, but for maximum performance, I would consider storing the exclusion list in transient cache and hooking a 'clear cache' action to clear the exclusion list transient when posts, terms, or users are updated.
https://github.com/SurniaUlula/wpsso/blob/master/lib/wp-sitemaps.php
js.
Diff to test $sitemap_entry returned by filters and skip empty values (false and empty array).