Make WordPress Core

Opened 6 years ago

Last modified 5 years ago

#44848 new defect (bug)

Ensure that empty author profiles have proper 404 behaviour

Reported by: jonoaldersonwp's profile jonoaldersonwp Owned by:
Milestone: Awaiting Review Priority: normal
Severity: normal Version:
Component: Query Keywords: 2nd-opinion
Focuses: template Cc:

Description

Summary

  • WordPress 'creates' a profile posts archive (i.e., "See all posts by [author name]") for each user in the system. This can be reached by clicking 'view' from the Users table in wp-admin.
  • It does so, even if they have no posts - and even if they lack the permission to publish posts entirely.
  • Whilst the empty profile pages look like they serve a standard 404 error, in fact, they return a 200 HTTP header status, and just call the 404 template part for the main page body.
  • This results in many 'soft 404' pages/errors, which can adversley impact the way in search engines and other systems (Facebook, Twitter, etc) crawl, extract information from, and value/evaluate websites (see Google's documentation on soft 404s at https://support.google.com/webmasters/answer/181708?hl=en).

It's important that we correct this behaviour in WordPress Core for numerous reasons; from compliance with basic web standards (error pages should return an appropriate HTTP header status), to adherence with the guidelines and preferences of external platforms (search engines, social networks, etc), to consistency with other page/template handling processes.

Example

A newly created subscriber user on a test site (called 'Test User 2') with no authored posts has an author archive accessible at https://yoast.jonoalderson.com/author/test-user-2/. Although this presents as a 404 error, the page returns meta data and behaviour consistent with a valid author request (see the tags contained in the <head> of the HTML source; particularly the <meta> tags). The body template then just calls the 404 template part.

Solution

We should ensure that these scenarios return proper/consistent 404 behaviour.

Specifically, requests to author profiles where the author has zero posts should trigger normal 404 behaviour (including headers, query behaviour, etc), as opposed to just calling the 404 template.

E.g.,

// If author archive has zero posts
global $wp_query;
$wp_query->set_404();
status_header(404);
include( get_query_template( '404' ) );
die();

As opposed to just:

// If author archive has zero posts
include( get_query_template( '404' ) );
die();

Change History (6)

#1 follow-up: @SergeyBiryukov
6 years ago

Previously: [8761] / #5324; [18192] / #17316; [27290] / #20601.

Specifically, requests to author profiles where the author has zero posts should trigger normal 404 behaviour (including headers, query behaviour, etc), as opposed to just calling the 404 template.

Does this apply to author profiles only, or empty (but existing) post type and taxonomy archives as well (e.g. a category that doesn't have any post yet)?

#2 in reply to: ↑ 1 @jonoaldersonwp
6 years ago

Replying to SergeyBiryukov:

Does this apply to author profiles only, or empty (but existing) post type and taxonomy archives as well (e.g. a category that doesn't have any post yet)?

Oh gosh, you're right. This is terrifying.

200 HTTP header status for both.

Last edited 6 years ago by jonoaldersonwp (previous) (diff)

#3 @johnbillion
5 years ago

  • Keywords 2nd-opinion added

Bear in mind that some plugins/sites/themes use the author archive URL as a user profile page. It can be desirable for an author archive to not show a 404 even if there are no posts.

This could be the case for empty term and post type archives too, although much much less common.

#4 @johnbillion
5 years ago

  • Keywords close added

I'm inclined to think the current behaviour in core is correct. If an object exists then its permalink shouldn't 404, regardless of whether it's associated with any other objects that are listed at its permalink.

#5 @jonoaldersonwp
5 years ago

  • Keywords close removed

That's fine if the page represents the object. In many cases, it returns no content/response, and a 404 template is served with a 200 status. This is incorrect.

#6 @johnbillion
5 years ago

Is there a way to determine this? ie. what constitutes an object that does have representational data versus one that doesn't?

Note: See TracTickets for help on using tickets.