#50910 closed defect (bug) (fixed)
5.5 Sitemap URLs are incorrectly paginated
Reported by: | jonathanstegall | Owned by: | peterwilsoncc |
---|---|---|---|
Milestone: | 5.5.1 | Priority: | normal |
Severity: | normal | Version: | 5.5 |
Component: | Sitemaps | Keywords: | has-patch has-unit-tests commit |
Focuses: | Cc: |
Description
I've created a sitemap with enough posts that it needs to be paginated. The sitemap index file - /wp-sitemap.xml - has three post URLs.
In the index file, they display as:
- wp-sitemap-posts-post-1.xml
- wp-sitemap-posts-post-2.xml
- wp-sitemap-posts-post-3.xml
When I visit those URLs, they are:
- wp-sitemap-posts-post-1.xml
- wp-sitemap-posts-post-2.xml/page/2
- wp-sitemap-posts-post-3.xml/page/3
The first URL works correctly, but the last two are 404s.
This happens with no plugins activated and using the twentytwenty theme. My setup does have "Day and name" permalinks enabled. I'm using PHP 7.3.19 and Nginx 1.19.0 in a local development environment.
Attachments (3)
Change History (34)
#2
in reply to:
↑ 1
@
4 years ago
Replying to pbiron:
If I understand what you are describing, then:
- wp-sitemap-posts-post-2.xml IS the 2nd page
- wp-sitemap-posts-post-3.xml IS the 3rd page
I would expect
/page/2
, etc to result in 404's.
Yes, that's correct. But the /page/2 or /page/3 is incorrectly being added to the URLs that should be the 2nd and 3rd pages.
#3
follow-up:
↓ 4
@
4 years ago
Oh, you mean that when you click on wp-sitemap-posts-post-2.xml
you actually get redirected to wp-sitemap-posts-post-2.xml/page/2
?
#4
in reply to:
↑ 3
@
4 years ago
Replying to pbiron:
Oh, you mean that when you click on
wp-sitemap-posts-post-2.xml
you actually get redirected towp-sitemap-posts-post-2.xml/page/2
?
Yes, exactly. And this happens with no active plugins, using the default theme, and the other setup notes.
#5
follow-up:
↓ 6
@
4 years ago
Now I understand.
I am not able to reproduce that behavior with Apache and I don't have Nginx to test with.
Can you enter https://YOURDOMAIN/?sitemap=posts&sitemap-subtype=post&paged=2
in your browser and report back here what result you see?
p.s. thank you for the ticket and welcome to Trac!
#6
in reply to:
↑ 5
@
4 years ago
Replying to pbiron:
Now I understand.
I am not able to reproduce that behavior with Apache and I don't have Nginx to test with.
Can you enter
https://YOURDOMAIN/?sitemap=posts&sitemap-subtype=post&paged=2
in your browser and report back here what result you see?
p.s. thank you for the ticket and welcome to Trac!
Thanks for the welcome! That URL does not do any further redirects, but it shows this message in Firefox:
Error loading stylesheet: An unknown error has occurred (805303f4)
https://MYDOMAIN/wp-sitemap.xsl
In Chromium, I can view source to see the XML and it looks normal, but it only has the individual post URLs; not the URLs that get displayed on the sitemap index.
I was originally checking PHP error logs and I haven't seen any entries, which makes sense if it's an XSLT parsing issue.
#8
@
4 years ago
I am having the same issue. I found the cause which is Line 475 to Line 477 of wp-includes/canonical.php
As after removing the lines it works
if ( $paged > 1 ) {
$addl_path .= user_trailingslashit( "$wp_rewrite->pagination_base/$paged", 'paged' );
}
OR you can solve it with filter (but I think it should be fix in core)
<?php function lesterchan_redirect_canonical( string $redirect_url, string $requested_url ) { if ( strpos( $requested_url, 'wp-sitemap' ) !== false ) { return $requested_url; } return $redirect_url; } add_filter( 'redirect_canonical', 'lesterchan_redirect_canonical', 10, 2 );
#9
@
4 years ago
- Keywords needs-patch added
- Milestone changed from Awaiting Review to 5.5.1
Thanks for the report @jonathanstegall and welcome.
Moving to the 5.5.1 milestone for investigation.
This ticket was mentioned in PR #482 on WordPress/wordpress-develop by peterwilsoncc.
4 years ago
#11
- Keywords has-patch has-unit-tests added; needs-patch removed
#12
follow-up:
↓ 16
@
4 years ago
I've put together a pull request for this modifying redirect_canonical()
further to account for sitemap pagination.
I did investigate adding is_sitemap()
and the required changes to WP
and WP_Query()
but either:
- have no idea what I am doing
- sitemaps don't use the main query
I suspect the former, tbh, they're pretty deep internals.
@swissspidy or @pbiron are you able to take a look at the PR to see if I am missing anything?
#13
@
4 years ago
I have the same problem. If you open https://www.aranzulla.it/wp-sitemap.xml, you can see that https://www.aranzulla.it/wp-sitemap-posts-post-2.xml (page > 1) generates a redirect.
#14
@
4 years ago
I'm not sure why, but I'm now seeing the behavior described...and I wasn't before.
50910.diff solves the problem is essentially the same was as the PR from @peterwilsoncc (by ensuring that redirect_canonical()
doesn't incorrectly add page/2
), but does so in a simpler way.
The patch introduces a new function, get_sitemap_url()
that is similar in spirit to get_permalink()
but for sitemap URLs and uses that in redirect_canonical()
.
Not only does that make redirect_canonical()
easier to follow when dealing with sitemap canonicals compared to Peter's PR, but it "future proofs" things as well. Peter's PR in essence bakes the structure of sitemap canonical URLs into redirect_canonical()
. Since WP_Sitemap_Provider
has a get_sitemap_url()
method, having that structure baked into redirect_canonical()
could cause problems down the road should WP_Sitemaps_Provider::get_sitemap_url()
change the structure of the URLs (however unlikely such a future change may be). The new get_sitemap_url()
function uses WP_Sitemaps_Provider::get_sitemap_url()
.
The patch also moves Peter's unit test into the file with the existing sitemaps canonical tests: tests/canonical/sitemaps.php
(and uses a data provider to make it easier, IMO, to add additional test cases).
Additionally, the patch adds unit tests for the new get_sitemaps_url()
function.
#15
@
4 years ago
Regarding the comment on the PR:
Fun this passes with phpunit --filter test_sitemaps_canonical_redirects but fails with the full run or phpunit --group canonical. My suspicion is something is hanging around $wp from a previous test.
The reason the unit tests in the PR fail when run as phpunit --group canonical
has to do with an interaction between the way sitemaps adds rewrite rules/tags and WP_UnitTestCase_Base::go_to()
, which is called by WP_Canonical_UnitTestCase::assertCanonical()
.
Sitemaps adds rewrite tags (e.g., sitemap
, sitemap-subtype
and sitemap-stylesheet
) on init
and those tags are added to WP::public_query_vars
. WP_UnitTestCase_Base::go_to()
creates a new instance of the WP
class, but since sitemaps aren't "reinitialized", $wp->public_query_vars
doesn't contain the sitemaps rewrite tags in subsequent tests. I hope that makes sense!
Moving the unit test to Tests_Canonical_Sitemaps
gets around that problem since Tests_Canonical_Sitemaps::setUp()
"manually" initializes sitemaps before each test and thus the rewrite tags are correctly added.
That might be good enough, but I want to float an idea. The sitemaps feature plugin had to call add_rewrite_tag()
when adding it's rewrite rules. But maybe, now that sitemaps are in core, it would be better to add those directly to WP::$public_query_vars
, as in:
<?php class WP { /** * Public query variables. * * Long list of public query variables. * * @since 2.0.0 * @since 5.5.1 Added 'sitemap', 'sitemap-subtype' and 'sitemap-stylesheet' * @var string[] */ public $public_query_vars = array( 'm', 'p', 'posts', 'w', 'cat', 'withcomments', 'withoutcomments', 's', 'search', 'exact', 'sentence', 'calendar', 'page', 'paged', 'more', 'tb', 'pb', 'author', 'order', 'orderby', 'year', 'monthnum', 'day', 'hour', 'minute', 'second', 'name', 'category_name', 'tag', 'feed', 'author_name', 'pagename', 'page_id', 'error', 'attachment', 'attachment_id', 'subpost', 'subpost_id', 'preview', 'robots', 'favicon', 'taxonomy', 'term', 'cpage', 'post_type', 'embed', 'sitemap', 'sitemap-subtype', 'sitemap-stylesheet' ); } ... }
and not call add_rewrite_tag()
in WP_Sitemaps::add_rewrite_rules()
.
@swissspidy what do you think about that idea?
#16
in reply to:
↑ 12
@
4 years ago
Replying to peterwilsoncc:
I did investigate adding
is_sitemap()
and the required changes toWP
andWP_Query()
but either:
- have no idea what I am doing
- sitemaps don't use the main query
I suspect the former, tbh, they're pretty deep internals.
The idea of adding is_sitemap()
did come up in https://github.com/GoogleChromeLabs/wp-sitemaps/issues/134, but there was an easier way to address the problem that introducing that would have solved.
Maybe it would still be good to add that, but I think that should be it's own ticket.
This ticket was mentioned in Slack in #core by audrasjb. View the logs.
4 years ago
This ticket was mentioned in Slack in #core by pbiron. View the logs.
4 years ago
#20
@
4 years ago
- Keywords needs-testing added
50910.2.diff is just like 50910.diff, with the following minor differences:
- in
redirect_canonical()
it checks for the sitemaps case first, before checking for uses ofpaged
. Functionally it's the same but it simplifies the conditional - removes a "debugging"
@group bar
tag I had added to some of the unit tests...that allowed me to just run those tests while developing them
#21
follow-up:
↓ 23
@
4 years ago
I investigated the suggestion in comment 15, but decided against making that change, at least for now, for a few reasons:
- that change is not needed the fix this problem
- the way the unit tests are now structured, that change isn't need when running the unit tests
It might be a worthwhile change to make on it's own, but I'll open another ticket where the pros/cons can be discussed.
While investigating that, however, I came across something odd. There is an existing unit test that fails for me locally and as far as I can tell, should be failing on Travis since sitemaps were merged into core.
It's the test in tests/phpunit/tests/query/vars.php
, which was added in [36045] and changed in [36048]. That test can be run with:
phpunit --group 35115
It checks that $wp->public_query_vars
is correct. But since sitemaps were merged into core, that "expected" array should contains sitemap
, sitemap-subtype
, sitemap-stylesheet
which it doesn't and the test fails for me locally.
When the tests are run on travis, how has it been passing? Any ideas @SergeyBiryukov ?
This ticket was mentioned in Slack in #core-sitemaps by pbiron. View the logs.
4 years ago
#23
in reply to:
↑ 21
@
4 years ago
- that change is not needed the fix this problem
- the way the unit tests are now structured, that change isn't need when running the unit tests
Makes perfect sense on both accounts.
I've replaced my PR with your code, so I can get a full test run going.
While reviewing your code:
- I found some repeated code in the
redirect_canonical()
block, you hadif sitemaps, elseif paged or other stuff, elseif sitemaps
. I've removed the repetition in the pull request. - I also added some canonical tests for ugly permalinks to ensure the sitemaps don't attempt to redirect when pretty permalinks are enabled.
The full pull request in #482 with your code, my tests. The changes I've made are in this commit.
I am still waiting for the unit tests to complete on the PR, once they're done I'll upload a new patch and I think it's good to go.
#24
@
4 years ago
- Keywords commit added; needs-testing removed
Thanx for testing Peter!
Marking for commit.
This ticket was mentioned in Slack in #core by pbiron. View the logs.
4 years ago
#26
@
4 years ago
- Owner changed from SergeyBiryukov to peterwilsoncc
- Status changed from accepted to assigned
#28
@
4 years ago
- Resolution fixed deleted
- Status changed from closed to reopened
Reopening for backporting to 5.5 branch.
#30
follow-up:
↓ 31
@
3 years ago
I still got this bug in wp 5.8.2. I tested it only using woocommerce plugin and twenty twenty theme.
https://xxx.xxx/wp-sitemap-posts-product-1.xml (OK)
https://xxx.xxx/wp-sitemap-posts-product-2.xml (404)
https://xxx.xxx/wp-sitemap-posts-product-3.xml (404)
the response header is 404. but it shows the sitemap correctly. But google and friends complain about it
#31
in reply to:
↑ 30
@
3 years ago
Replying to erikdemarco:
I still got this bug in wp 5.8.2. I tested it only using woocommerce plugin and twenty twenty theme.
https://xxx.xxx/wp-sitemap-posts-product-1.xml (OK)
https://xxx.xxx/wp-sitemap-posts-product-2.xml (404)
https://xxx.xxx/wp-sitemap-posts-product-3.xml (404)
the response header is 404. but it shows the sitemap correctly. But google and friends complain about it
i can confirm i have the same problem since months. running wp 5.9
If I understand what you are describing, then:
I would expect
/page/2
, etc to result in 404's.