Make WordPress Core

Opened 4 years ago

Last modified 15 months ago

#51912 reopened defect (bug)

Sitemap pages 404 with more than one page

Reported by: loranrendel's profile loranrendel Owned by:
Milestone: Awaiting Review Priority: normal
Severity: normal Version: 5.5
Component: Sitemaps Keywords: has-patch
Focuses: Cc:

Description

When there are more than 1 page in sitemap there may be an error: the sitemap will be provided with right content but 404 code.

For example I will decrease max url count from 1000 to 2 in new test WordPress installation with 9 posts:

<?php
add_filter('wp_sitemaps_max_urls', function () {
    return 2;
});

https://testwp.xpor.org/wp-sitemap-posts-post-1.xml 200
https://testwp.xpor.org/wp-sitemap-posts-post-2.xml 404

WP_Query dump:

WP_Query Object
(
    [query] => Array
        (
            [paged] => 2
            [sitemap] => posts
            [sitemap-subtype] => post
        )

    [query_vars] => Array
        (
            [paged] => 2
            [sitemap] => posts
            [sitemap-subtype] => post
            [error] => 
            [m] => 
            [p] => 0
            [post_parent] => 
            [subpost] => 
            [subpost_id] => 
            [attachment] => 
            [attachment_id] => 0
            [name] => 
            [pagename] => 
            [page_id] => 0
            [second] => 
            [minute] => 
            [hour] => 
            [day] => 0
            [monthnum] => 0
            [year] => 0
            [w] => 0
            [category_name] => 
            [tag] => 
            [cat] => 
            [tag_id] => 
            [author] => 
            [author_name] => 
            [feed] => 
            [tb] => 
            [meta_key] => 
            [meta_value] => 
            [preview] => 
            [s] => 
            [sentence] => 
            [title] => 
            [fields] => 
            [menu_order] => 
            [embed] => 
            [category__in] => Array
                (
                )

            [category__not_in] => Array
                (
                )

            [category__and] => Array
                (
                )

            [post__in] => Array
                (
                )

            [post__not_in] => Array
                (
                )

            [post_name__in] => Array
                (
                )

            [tag__in] => Array
                (
                )

            [tag__not_in] => Array
                (
                )

            [tag__and] => Array
                (
                )

            [tag_slug__in] => Array
                (
                )

            [tag_slug__and] => Array
                (
                )

            [post_parent__in] => Array
                (
                )

            [post_parent__not_in] => Array
                (
                )

            [author__in] => Array
                (
                )

            [author__not_in] => Array
                (
                )

            [ignore_sticky_posts] => 
            [suppress_filters] => 
            [cache_results] => 1
            [update_post_term_cache] => 1
            [lazy_load_term_meta] => 1
            [update_post_meta_cache] => 1
            [post_type] => 
            [posts_per_page] => 10
            [nopaging] => 
            [comments_per_page] => 50
            [no_found_rows] => 
            [order] => DESC
        )

    [tax_query] => WP_Tax_Query Object
        (
            [queries] => Array
                (
                )

            [relation] => AND
            [table_aliases:protected] => Array
                (
                )

            [queried_terms] => Array
                (
                )

            [primary_table] => wp_posts
            [primary_id_column] => ID
        )

    [meta_query] => WP_Meta_Query Object
        (
            [queries] => Array
                (
                )

            [relation] => 
            [meta_table] => 
            [meta_id_column] => 
            [primary_table] => 
            [primary_id_column] => 
            [table_aliases:protected] => Array
                (
                )

            [clauses:protected] => Array
                (
                )

            [has_or_relation:protected] => 
        )

    [date_query] => 
    [request] => SELECT SQL_CALC_FOUND_ROWS  wp_posts.ID FROM wp_posts  WHERE 1=1  AND wp_posts.post_type = 'post' AND (wp_posts.post_status = 'publish' OR wp_posts.post_status = 'private')  ORDER BY wp_posts.post_date DESC LIMIT 10, 10
    [posts] => Array
        (
        )

    [post_count] => 0
    [current_post] => -1
    [in_the_loop] => 
    [comment_count] => 0
    [current_comment] => -1
    [found_posts] => 0
    [max_num_pages] => 0
    [max_num_comment_pages] => 0
    [is_single] => 
    [is_preview] => 
    [is_page] => 
    [is_archive] => 
    [is_date] => 
    [is_year] => 
    [is_month] => 
    [is_day] => 
    [is_time] => 
    [is_author] => 
    [is_category] => 
    [is_tag] => 
    [is_tax] => 
    [is_search] => 
    [is_feed] => 
    [is_comment_feed] => 
    [is_trackback] => 
    [is_home] => 
    [is_privacy_policy] => 
    [is_404] => 1
    [is_embed] => 
    [is_paged] => 
    [is_admin] => 
    [is_attachment] => 
    [is_singular] => 
    [is_robots] => 
    [is_favicon] => 
    [is_posts_page] => 
    [is_post_type_archive] => 
    [query_vars_hash:WP_Query:private] => bcf5fd65d0a7962d637cd5cb9d865508
    [query_vars_changed:WP_Query:private] => 
    [thumbnails_cached] => 
    [stopwords:WP_Query:private] => 
    [compat_fields:WP_Query:private] => Array
        (
            [0] => query_vars_hash
            [1] => query_vars_changed
        )

    [compat_methods:WP_Query:private] => Array
        (
            [0] => init_query_flags
            [1] => parse_tax_query
        )

)

Change History (13)

#1 @SergeyBiryukov
4 years ago

  • Component changed from General to Sitemaps
  • Summary changed from Sitemap pages 404 to Sitemap pages 404 with more than one page

#2 @peterwilsoncc
4 years ago

  • Version changed from 5.5.3 to 5.5

Thank you for your report.

I am able to reproduce this from version 5.5, I'll bring this to the attention of the sitemap maintainers for their review.

This ticket was mentioned in Slack in #core-sitemaps by peterwilsoncc. View the logs.


4 years ago

This ticket was mentioned in Slack in #core-sitemaps by peterwilsoncc. View the logs.


3 years ago

#5 @peterwilsoncc
3 years ago

I think the root cause of this is #51117.

The report in #53095 suggests this is a more serious problem for sites with a custom post type that contains more post objects than there are native WordPress post post objects.

As the sitemaps execute the main (is_home()) query, that query is used to determine the the pages status. On sites with multiple pages in the site map, then the page parameter is passed to the is_home query causing a file not found error if the frontend would not have the same number of pages.

Consider the following site:

  • 9 post objects
  • 3500 custom post type objects

Page two of the CPT sitemap will 404 as there is only a single page of posts according the the posts per page setting in the dashboard.

The same will apply for user and taxonomy site maps if there are more authors or terms than posts.

#6 @peterwilsoncc
3 years ago

#53095 was marked as a duplicate.

#7 @tigerfinch
3 years ago

  • Keywords has-patch added
  • Resolution set to invalid
  • Status changed from new to closed

(Apologies if I'm tagging this wrong... I've tagged has-patch as I've got a potential resolution)

The solution is – when a sitemap is being generated – to force the main query to use the post_type of the sitemap.

As an interim solution for users, I found this worked in theme/plugin code:

  add_filter('pre_get_posts', function($query) {
    global $wp_query;
    if ($wp_query->query['sitemap'] === 'posts')
      $query->set('post_type', $wp_query->query['sitemap-subtype']);
    return $query;
  });

As a core fix for this, we could alter the function function register_rewrites() in class-wp-sitemaps.php to pass the requested custom post type to the main query:

// Register routes for providers.
add_rewrite_rule(
  '^wp-sitemap-([a-z]+?)-([a-z\d_-]+?)-(\d+?)\.xml$',
  'index.php?sitemap=$matches[1]&sitemap-subtype=$matches[2]&paged=$matches[3]',
  'top'
);

to

// Register routes for providers.
add_rewrite_rule(
  '^wp-sitemap-([a-z]+?)-([a-z\d_-]+?)-(\d+?)\.xml$',
  'index.php?sitemap=$matches[1]&sitemap-subtype=$matches[2]&post_type=$matches[2]&paged=$matches[3]',
  'top'
);

#8 @tigerfinch
3 years ago

  • Resolution invalid deleted
  • Status changed from closed to reopened

Whoops, didn't mean to close the issue, apologies

#9 @Tkama
3 years ago

I described the problem and temporary solution here https://wp-kama.com/handbook/sitemap/bag-404-pagination

The core solution is to move Sitemap init from template_redirect hook to parse_request hook, as it's done for REST API.

#10 @RavanH
2 years ago

This plugin should resolve the issue https://wordpress.org/plugins/xml-sitemaps-manager/ which applies the solution proposed by @Tkama Please let me know if there are still problems after applying the fix.

#11 follow-up: @mystery8
2 years ago

any update about this? why this bug gets low priority? It will cause big sites (>2000 pages) lose many pages in serp

Version 0, edited 2 years ago by mystery8 (next)

#12 in reply to: ↑ 11 @RavanH
2 years ago

Replying to mystery8:

any update about this? why this bug gets low priority? It will cause big sites (>2000 pages) lose many pages in serp.

Try https://wordpress.org/plugins/xml-sitemaps-manager/ it will not change your sitemap, only fix the bug plus give you some options (which you can ignore). You can simply remove the plugin after the bug is fixed in core.

#13 @peterwilsoncc
15 months ago

#57961 was marked as a duplicate.

Note: See TracTickets for help on using tickets.