Make WordPress Core

Opened 17 months ago

Closed 10 months ago

Last modified 10 months ago

#57816 closed defect (bug) (fixed)

WordPress shows author archive in sitemap even if there are no posts

Reported by: zodiac1978's profile zodiac1978 Owned by: swissspidy's profile swissspidy
Milestone: 6.4 Priority: normal
Severity: normal Version: 5.5
Component: Sitemaps Keywords: good-first-bug has-patch has-unit-tests commit
Focuses: Cc:

Description

It looks like this should be avoided: If there are no posts in the WP installation (only pages), the wp-sitemap.xml still contains the author archive. And if you follow the link, you get a 404 error.

This couldn't make sense for SEO ...

Change History (16)

#1 @amisiewicz
17 months ago

Just to clarify, it does seem like a bug to me, but I may be missing something. That's why I'm waiting for input from the other people / team members before taking any action.

For now, I'm leaving the code to possibly fix this as a use of a filter

<?php
add_filter( 'wp_sitemaps_users_query_args', function( $args ) {

        if ( ( $key = array_search( 'page', $args['has_published_posts'] ) ) !== false ) {
                unset( $args['has_published_posts'][ $key ] );
        }
        return $args;

}, 10, 1 );
Version 0, edited 17 months ago by amisiewicz (next)

#2 @zodiac1978
17 months ago

Found this:

Should only be generated for those authors with at least 1 published post.

Source: https://github.com/GoogleChromeLabs/wp-sitemaps/issues/23

Looks like this is a bug then.

#3 @swissspidy
14 months ago

  • Milestone changed from Awaiting Review to Future Release

#4 follow-up: @swissspidy
14 months ago

The WP_User_Query in WP_Sitemaps_Users uses 'has_published_posts' to only list users with at least 1 published post in any post type (including pages, excluding attachments).

Since pages are not normally shown in the author archives, we could add unset( $public_post_types['page'] ); here:

https://github.com/WordPress/wordpress-develop/blob/e2a747662ee018262a86ddfbc38cd7292bc911bb/src/wp-includes/sitemaps/providers/class-wp-sitemaps-users.php#L139-L140

That said, the author archive for a user with no published posts returns a 200 status (albeit an empty page), not a 404. So... not sure if that needs to be changed at all 🤷‍♂️

Edit: also, what happens if a site actually does show pages in author archives too?

Last edited 13 months ago by swissspidy (previous) (diff)

#5 in reply to: ↑ 4 @zodiac1978
13 months ago

  • Keywords 2nd-opinion added

Replying to swissspidy:

That said, the author archive for a user with no published posts returns a 200 status (albeit an empty page), not a 404. So... not sure if that needs to be changed at all 🤷‍♂️

Yes, that's true. But still not very helpful to report an empty page to Google, or not?

Edit: also, what happens if a site actually does show pages in author archives too?

I'm not sure if an edge case like this should stop us fixing this for the 99% others.

But I'm just looking at my installations and the missing UI to disable the sitemap with no real content.

If you think this is technically correct and needs no fixing, then we can close it. From a UX point of view, I'm still finding this very confusing.

#6 @swissspidy
12 months ago

  • Keywords good-first-bug needs-unit-tests added; 2nd-opinion removed
  • Milestone changed from Future Release to 6.4

Let's just do it.

  • Adding unset( $public_post_types['page'] );
  • Adding a test to verify the new behavior

This ticket was mentioned in PR #4846 on WordPress/wordpress-develop by nirav7707.


12 months ago
#7

  • Keywords has-patch added; needs-patch removed

Trac Ticket -> https://core.trac.wordpress.org/ticket/57816
This PR addresses the issue of including users in the user sitemap who have not published any posts but have published pages.

To resolve this, we have implemented a modification. By using the unset( $public_post_typespage? ); function, we remove the public post type "page" from the query arguments. This ensures that users who have only published pages are not included in the user sitemap.

This ticket was mentioned in PR #4900 on WordPress/wordpress-develop by @swissspidy.


12 months ago
#8

  • Keywords has-unit-tests added; needs-unit-tests removed

@swissspidy commented on PR #4846:


12 months ago
#9

Thanks for your contribution!

As this change needs an accompanying unit test, I have just opened another PR to add one: #4900

I'm closing yours in favor of the new one. But don't worry, you should still get props for your contribution! 🙂

#10 @swissspidy
12 months ago

  • Owner set to swissspidy
  • Status changed from new to reviewing

Props @niravsherasiya7707 for the PR. I just added some unit tests. This should be good to go now.

@swissspidy commented on PR #4900:


11 months ago
#11

@pbiron Would love to get your review on this one

#12 @oglekler
11 months ago

  • Keywords needs-testing added

#13 @huzaifaalmesbah
10 months ago

Test Report

I Test 4900.diff ✅ working

Environment

OS: macOS m1
WordPress 6.4-alpha-56267-src
PHP 7.4.33
nginx/1.25.2
MySQL 5.7.43
Browser: Chrome 116.0.5845.140
Theme: Twenty Twenty v2.3
Active Plugins: No plugins activated.

Expected Results

✅ Don't show the author's sitemap if there are no posts

Actual Results

✅ After Applying Patch. working properly

Before Applying Patch

https://i.ibb.co/51kVqcD/Screenshot-2023-09-16-at-12-00-44-AM.png

After Applying Patch

https://i.ibb.co/njvFtwW/Screenshot-2023-09-16-at-12-01-11-AM.png

#14 @swissspidy
10 months ago

  • Keywords commit added; needs-testing removed

#15 @swissspidy
10 months ago

  • Resolution set to fixed
  • Status changed from reviewing to closed

In 56708:

Sitemaps: do not list users who only authored pages.

Author archives are only generated for users who created at least one post.
Prevent adding author archives to the XML sitemap for users who only authored pages
as the links would otherwise result in a 404.

Props zodiac1978, huzaifaalmesbah.
Fixes #57816.

Note: See TracTickets for help on using tickets.