Make WordPress Core

Opened 14 years ago

Closed 12 years ago

Last modified 3 years ago

#11312 closed enhancement (duplicate)

Add 'handle_404' hook to allow plugins to route URLs

Reported by: mikeschinkel's profile mikeschinkel Owned by: ryan's profile ryan
Milestone: Priority: normal
Severity: normal Version:
Component: Rewrite Rules Keywords: needs-patch dev-feedback
Focuses: Cc:

Description

The method $wp->handle_404() in /wp-includes/classes.php does not provide a way for plugins to tell it that a URL is a valid URL. I have uploaded a simple 2 line patch that implements a call to a 'handle_404' hook to allow a plugin to indicate that it will handle a given URL and not to set 404 and reset all query variables.

Attachments (1)

handle_404.diff (748 bytes) - added by mikeschinkel 14 years ago.
Added a 'handle_404' hook to the $wp->handle_404() method on /wp-includes/classes.php

Download all attachments as: .zip

Change History (21)

@mikeschinkel
14 years ago

Added a 'handle_404' hook to the $wp->handle_404() method on /wp-includes/classes.php

#1 @nacin
14 years ago

  • Milestone changed from 2.9 to Future Release

#2 @dd32
14 years ago

  • Keywords needs-patch dev-feedback added; handle_404 hooks 404 rewrite url routing removed

milestone changed from 2.9 to Future Release

Just for anyone following, This is due to the 2.9 feature freeze, All enhancements are now bound for 3.0 or Future Release. 3.0 is for "commiter blessed" enhancements at this stage.

needs-patch: IMO, This patch is not up to any standard for inclusion. If this was to be handled by core, This should be handled much earier in the query process, near the location to where $is_404 is set originally, not when the headers are output. The filter should also carry extra details.

#3 @strider72
14 years ago

Just for the record (no statement of my opinion), from wp-hackers:

"I just submitted a patch for $wp->handle_404() that is similar to what I proposed below but only run after the most common use cases fail thus having zero performance impact on cases that WordPress already handles."

Thus the reasoning behind locating it late in the query process.

#4 @Otto42
14 years ago

Also, how is this different than just setting $wp_query->is_404 = false somewhere in your plugin or when the query is executing or what have you?

I'm just not seeing for what use case this is needed, is all.

#5 follow-up: @mikeschinkel
14 years ago

  • Cc mikeschinkel@… added

For some reason I did not get notification of your responses. I think I forgot to check CC.

I'm working on another project and am yet again running into the problem this causes.

@dd32 - Please review the process flow. The problem is in the handle_404() function. The only states where it does not throw a 404 are:

  • count($wp_query->posts) !=0
  • !is_search()
  • is_tag()
  • is_category()
  • is_author()

Nowhere does it allow for a custom URL that should simply be routed. Yes I can use the template_redirect hook but it will issues a 404 that I can't control. If the problem isn't obvious now then please explain how someone can use template_redirect hook w/o having a 404 already issue w/o fixing handle_404()?

Here's the dirty, ugly hack I've had to get around a 404:

static function the_posts($posts) {
	if (is_my_custom_url()) {
		$posts[] = null; // Dirty, Ugly hack to get around WordPress throwing a 404 error.
	}
	return $posts;
}
add_filter('the_posts', array(__CLASS__, 'the_posts'));

static function parse_query($request) {
	if (is_my_custom_url()) {
		$wp_query->is_single = false;
		$wp_query->is_home = false;
		$wp_query->is_404 = false;
	}
}
add_filter('parse_query', array(__CLASS__, 'parse_query'));

@Otto42 If you read the code in WordPress for handle_404() you'll find that WordPress doesn't respect the 404 setting; even if you set it to false in a prior hook WordPress still thinks it's smarter than you and decides for itself if it should be 404 and when it does it sends the 404 immediate to the browser.

Basically if you want to use a custom URL where the URL isn't trying to resolve to something WordPress expects like a post or an archive or a search then it issues a 404, period.

I've created a plugin that demonstrates the problem. The plug is called MyRouterTest and it routes a path of "mypath" using template_redirect. It triggers a 404 and won't display on most browsers (but it will display on Mac/Safari, go figure):

AFAICK, there is no way to keep this from triggering a 404 with using some kind of ugly hack as I showed above.

Or maybe I'm not seeing something obvious? If so, please educate me. But if not please don't simply dismiss this without at least trying to understand the problem is.

#6 in reply to: ↑ 5 @Otto42
14 years ago

Replying to mikeschinkel:

@Otto42 If you read the code in WordPress for handle_404() you'll find that WordPress doesn't respect the 404 setting; even if you set it to false in a prior hook WordPress still thinks it's smarter than you and decides for itself if it should be 404 and when it does it sends the 404 immediate to the browser.

No, it doesn't. That's not how the PHP header() function works. Headers are held back from being sent until there is non-header output to be sent or until end of execution.

Until the theme gets called, WordPress should not make any output at all, and headers, including the 404, will not be sent.

Basically if you want to use a custom URL where the URL isn't trying to resolve to something WordPress expects like a post or an archive or a search then it issues a 404, period.

Not if you have properly set $wp_query->is_404 to false. Then that handle_404() function will set a 200 header status.

Another way to work around it is simply to set your own 200 header before producing any output. Again, headers are *not* sent until output is produced. Check it yourself, use the headers_sent() function to determine if the headers have been sent or not at any point in execution.

I've created a plugin that demonstrates the problem. The plug is called MyRouterTest and it routes a path of "mypath" using template_redirect. It triggers a 404 and won't display on most browsers (but it will display on Mac/Safari, go figure):

AFAICK, there is no way to keep this from triggering a 404 with using some kind of ugly hack as I showed above.

Or maybe I'm not seeing something obvious? If so, please educate me. But if not please don't simply dismiss this without at least trying to understand the problem is.

I understand your the problem, I'm just not certain how you're experiencing it or why you need to resort to this odd trickery to do what you want...

The most obvious solution, as I see it, would be to simply set the header yourself.

add_filter('template_redirect', 'test' );
function test() {
	status_header( 200 );
	echo 'Not a 404....';
	exit;
}

Even if you wanted to do it the other way and reset is_404, I'm pretty sure parse_query is the wrong hook to use, however I confess that I don't know the right hook to use off the top of my head.

#7 follow-up: @mikeschinkel
14 years ago

@Otto24 "Not if you have properly set $wp_query->is_404 to false. Then that handle_404() function will set a 200 header status."

How does one properly set $wp_query->is_404 to false? AFAICT that's exactly what I did, still doesn't work.

@Otto42 "Another way to work around it is simply to set your own 200 header before producing any output. Again, headers are *not* sent until output is produced. Check it yourself, use the headers_sent() function to determine if the headers have been sent or not at any point in execution."

That's a dirty, ugly hack.

Basically you are letting WordPress go through and set 404 even though it's not a 404. That creates potential undefined cases for other plugins that may cause them to operate incorrectly. Your answer is essentially "Just let WordPress think everything is wrong during the entire bootstrap process and then fix it at the last second." What if some other plugin that monitors 404 has already sent data after the headers?

Sorry, but I don't see how your suggestion can amount to anything other than bad software engineering.

@I understand your the problem, I'm just not certain how you're experiencing it or why you need to resort to this odd trickery to do what you want...

Trickery? How else am I going to get WordPress to let me have control of a URL? Trickery is reissuing a status header.

Reissuing a status header is not a problem per se within code you control. But the problem here is that WordPress is allowed to run with its state variables representing a state which is other than the real state. That's a problem and can lead to unexpected errors. It's like not testing edge cases; it works most of the time but not always.

FYI, routing a simple URL is so hard for most people who are not immersed in exactly how WordPress work (such as yourself and maybe 10 other people in the world) just throw up their hands and give up. Case in point, I was just hired by a seasoned PHP developer and paid $300 to write a plugin that simply routes a set of URLs, nothing more! It should NOT be this hard. I'd rather it be easy and have not made that money, btw.

#8 in reply to: ↑ 7 @Otto42
14 years ago

Replying to mikeschinkel:

How does one properly set $wp_query->is_404 to false? AFAICT that's exactly what I did, still doesn't work.

I can't answer that, need to read that code more to figure it out.

That's a dirty, ugly hack.

Not at all! That's how PHP is expressly designed to work. The header function (which is what status_header is calling) is intentionally made to allow the code to overload headers, sending only the most recent ones.

See http://php.net/manual/en/function.header.php

Basically you are letting WordPress go through and set 404 even though it's not a 404.

Define "set" in that context. You're setting a header, yes, but it's not like you can read out those headers later or anything. Headers are write-only, basically.

That creates potential undefined cases for other plugins that may cause them to operate incorrectly. Your answer is essentially "Just let WordPress think everything is wrong

Since when is a 404 "wrong"? It's a not found status code. That's not "wrong", it's the correct behavior if no posts to display have been found. You are overriding that with your plugin to say "yes, there's no posts, but I'm not displaying posts, so I'm going to set the header to something else".

What if some other plugin that monitors 404 has already sent data after the headers?

You can't prevent bad plugins from doing bad things and interfering with each other. What if you put a filter into handle_404? Some plugin could easily going to hook to that, and then take action based on it, like sending output.

Trickery? How else am I going to get WordPress to let me have control of a URL? Trickery is reissuing a status header.

Not trickery, design. PHP is *designed* to let you reissue status headers. That is *correct behavior*.

Reissuing a status header is not a problem per se within code you control. But the problem here is that WordPress is allowed to run with its state variables representing a state which is other than the real state.

How is that not a real state? No posts with the given criteria were found. It's set the header to 404 Not Found accordingly. The fact that your plugin comes along and redefines that doesn't make WordPress wrong to do what it did.

Think of it like this: You're using WordPress as a front end to whatever the heck you're outputting. But that's all you're doing, you're ignoring the entire posts system here. You're probably not storing data in a normal manner as far as WP is concerned. All you're doing is using the URL parser, essentially.

WordPress uses the results of the query to set the status header. If you're not using the query, then setting the status header becomes your problem since you're not using WordPress to do it. In that case, using a status_header(200) call is the right thing for your plugin to do.

Case in point, I was just hired by a seasoned PHP developer and paid $300 to write a plugin that simply routes a set of URLs, nothing more! It should NOT be this hard. I'd rather it be easy and have not made that money, btw.

Tell him I would have done it for $250. :P

It's *not* hard. It's just badly documented.

#9 follow-up: @mikeschinkel
14 years ago

@Otto42 Either you are not getting the point or you are destined to always seen things the opposite of the way I do (which, btw, makes you wrong. ;-p)

@Otto42 Not at all! That's how PHP is expressly designed to work. The header function (which is what status_header is calling) is intentionally made to allow the code to overload headers, sending only the most recent ones.

Yes, PHP allows you to call header() multiple times to set the same header, I'm not disputing that. What I'm saying is that in the context of WordPress using it is a bandaid and is potentially harmful because you are not addressing the root issue, you are sweeping the dirt under the carpet.

@Otto42 Since when is a 404 "wrong"? It's a not found status code. That's not "wrong", it's the correct behavior if no posts to display have been found. You are overriding that with your plugin to say "yes, there's no posts, but I'm not displaying posts, so I'm going to set the header to something else".

You are conflating to issues that should be separate. 404 is a well known status code for HTTP that tells the browser that the URI requested was not found by the server. It is not the status code that means "Wordpress' posts not found." By setting 404 on a valid URL you are letting WordPress maintain the notion that the URI requested was not found when in this case it most certainly was. The fact that you want to think about it from a WordPress-centric focus still doesn't make it a 404. It is dangerous to have WordPress think it is a 404 and tell plugins it is a 404 when in fact is not a 404.

If the property were instead is_posts_found then I'd agree with you; no posts were found. No, 404 is not that, it is "URI requested not found" and that is invalid when the the URI matches a pattern we want matched. Setting it to 404 internally is just wrong.

So you can understand, what follows are the hooks called between parse_query and template_redirect for the home page on v2.9. Setting $is_404 to true when the URI requested was indeed "found" means that plugins using any of these hooks can potentially misunderstand and do something they shouldn't do when the URL is actually found.

action: parse_query
action: pre_get_posts
filter: pre_option_posts_per_page
filter: option_posts_per_page
filter: pre_option_comments_per_page
filter: option_comments_per_page
filter: sanitize_title
filter: sanitize_title
filter: sanitize_title
filter: query
filter: pre_option_page_for_posts
filter: option_page_for_posts
filter: pre_option_show_on_front
filter: option_show_on_front
filter: sanitize_title
filter: posts_where
filter: posts_join
filter: posts_where_paged
filter: posts_groupby
filter: posts_join_paged
filter: posts_orderby
filter: posts_distinct
filter: post_limits
filter: posts_fields
action: posts_selection
filter: posts_where_request
filter: posts_groupby_request
filter: posts_join_request
filter: posts_orderby_request
filter: posts_distinct_request
filter: posts_fields_request
filter: post_limits_request
filter: posts_request
filter: query
filter: posts_results
filter: pre_option_sticky_posts
filter: option_sticky_posts
filter: the_posts
filter: status_header
action: wp
action: template_redirect

@Otto42 You can't prevent bad plugins from doing bad things and interfering with each other.

There is nothing bad about a plugin testing $is_404 and then taking an appropriate action.

What is bad is when WordPress doesn't allow a plugin to correctly set the value of $is_404 so that everyone (besides @Otto42) misunderstands what the value implies.

@Otto42 What if you put a filter into handle_404? Some plugin could easily going to hook to that, and then take action based on it, like sending output."

That's Reductio Ad Absurdum. As you said, you can't prevent bad plugins from doing bad things. I'm not sure this would be bad but even so that's like saying we should use hooks because someone might misuse hooks; clearly an absurd position.

@Otto42 Tell him I would have done it for $250. :P

Too late. ;-p

@Otto42 It's *not* hard. It's just badly documented.

It's more than badly documented, it's edge cases have never been tested by the core devs and those like you active on the wp-hackers list; you've not pushed it so you haven't seen where the gaping holes are.

#10 in reply to: ↑ 9 @Otto42
14 years ago

Replying to mikeschinkel:

Yes, PHP allows you to call header() multiple times to set the same header, I'm not disputing that. What I'm saying is that in the context of WordPress using it is a bandaid and is potentially harmful because you are not addressing the root issue, you are sweeping the dirt under the carpet.

See, I would disagree with that notion. It's a layering mechanism. WordPress sets what it thinks the header should be (and it's right, BTW), then your plugin corrects it if/when it needs to do so.

Adding a filter to let it check to see if any plugins want to set the header is a silly and unnecessary step. Your plugin is setting the header either way, yesno? Except using a filter mechanism is a lot more code to execute to achieve the exact same result.

Overriding the header has exactly the same result as using a filter to set it to what you want in the first place.

You are conflating to issues that should be separate. 404 is a well known status code for HTTP that tells the browser that the URI requested was not found by the server. It is not the status code that means "Wordpress' posts not found."

No, you are conflating the issues here by combining the notion of the URL with the router and the query. They are separate and behave separately.

WordPress is returning a 404 in this case because *it doesn't know about your URL*. You've told it to recognize your URl case and to set variables based on that. In a normal case, these variables would impact the query to return something to display. They didn't, it found no posts/pages/anything of any sort to display. So it set a 404 and would, in normal conditions, return the 404.php template file.

However, you seem to think that because you told it about the URL, that it shouldn't set the header. What you're missing is the fact that the URL is not the query and it is also not the router, no matter how much you want to combine all these things.

URLs get parsed into variables. Variables can control the query parameters. The results of the query set the headers and determine the template to use. This is the way things move through the system, and you're not doing it that way. You're wanting to bypass whole sections of this system without replacing what that section does completely.

I tell you that it is proper for the query to set headers because the query is what retrieves the content that the rest of the system is designed to display. If there is no content, then setting s 404 is appropriate. If, however, you want to replace the query and put your own content there, then you need to adjust the headers accordingly.

Yes, you say that the HTTP header codes are not made for WordPress. But in actual fact, the HTTP header codes are not made for anything in particular. They are available to be mapped to whatever conditions are appropriate, and WordPress maps "no content" to a "404". This is not an invalid mapping.

By setting 404 on a valid URL you are letting WordPress maintain the notion that the URI requested was not found when in this case it most certainly was.

No, the URL was not found until your plugin made it have meaning. Unfortunately, your plugin is incomplete until you have it set the header as well.

It is dangerous to have WordPress think it is a 404 and tell plugins it is a 404 when in fact is not a 404.

No, because if your plugin is not there, it is still a 404. It's only not a 404 when your plugin changes that fact.

If the property were instead is_posts_found then I'd agree with you; no posts were found. No, 404 is not that, it is "URI requested not found" and that is invalid when the the URI matches a pattern we want matched. Setting it to 404 internally is just wrong.

Your opinion is not the opinion of everybody else, nor is it "right" because you want it to be so. "URI requested not found" is a valid interpretation because the Resource that is being requested (post content) was not found by WordPress. Unless you change that fact, it remains valid.

A URI is not a URL. The distinction is important.

That's Reductio Ad Absurdum. As you said, you can't prevent bad plugins from doing bad things. I'm not sure this would be bad but even so that's like saying we should use hooks because someone might misuse hooks; clearly an absurd position.

No, it's not an absurd argument. You're saying that you need a hook because you can't override the 404. But you *can* override the 404. You can even detect when the 404 condition holds with a call to is_404().

In fact, all you really have to do is something like this in your template_redirect function:

if (i_have_something_to_display()) {
if (is_404()) { status_header(200); } // I have something to display, this ain't a 404 any more!
display_something();
}
 

Simple as that, really. It's a 404 *until* you have something to display, at which point you can change it to a non-404 code.

#11 @mikeschinkel
14 years ago

@Otto42 I give up. There are so many aspects of your reply I can dispute but I'm realizing I have lost more productive time because of your responses to my needs here and on wp-hackers that I can't believe it. I'm just not going to let you have that much power over me moving forward.

I am hoping more practical minds will prevail on this issue.

#12 @Otto42
14 years ago

  • Milestone Future Release deleted
  • Resolution set to invalid
  • Status changed from new to closed
  • Version 2.9 deleted

#13 @simonwheatley
14 years ago

  • Cc simon@… added
  • Resolution invalid deleted
  • Status changed from closed to reopened

Here's my use case for this patch or similar functionality: I'm creating a plugin against WP3.0 which adds a comment archive page for every registered user, very similar to the post archive page for authors, but, you know, for comments.

I've added rewrite rules for it, e.g. http://example.com/member/someone/comments and hooked WP_Query to have it issue appropriate queries to get the comments. This works fine... until you get to the pagination URLs, e.g. http://example.com/member/someone/comments/page/2. At this point the hard coded rules in handle_404 kick in and., as no posts are found AND it's a paginated query, the status is set to 404. I would argue that this isn't accurate on both levels, it's not intended to be a 404 AND WP_Query HAS found content, the content just doesn't happen to be posts.

I understand that I can override the status later, as you suggest Otto42, but it really feels like the cleaner, more semantically correct solution would be to override WP's instinct to set 404 at this point.

Who's with me? :)

#14 follow-ups: @filosofo
14 years ago

simonwheatley, I agree that we could really use more control over what happens in the WP objects's main method, but I don't think this patch is the way to go. It's somewhat arbitrary and not for the most part how filters are used in WordPress.

Better would be something that provided general access to what's going on in $wp. My inchoate idea is to separate $wp's methods that set state from those that act on that state.

Currently, it's a jumbled mix: handle_404 both controls state ("let's see...looks like a 404 situation, so let's toggle $wp_query's 404 property to true") and acts on the state by printing headers. Instead, one method should determine the current state, and another (with a filter or pass-by-reference action hook at the top) should do the header printing. Then we could hold off on the headers until much closer to the time the templates get included and print them all at once.

#15 in reply to: ↑ 14 @simonwheatley
14 years ago

Replying to filosofo:

<snip>
Currently, it's a jumbled mix: handle_404 both controls state ("let's see...looks like a 404 situation, so let's toggle $wp_query's 404 property to true") and acts on the state by printing headers. Instead, one method should determine the current state, and another (with a filter or pass-by-reference action hook at the top) should do the header printing. Then we could hold off on the headers until much closer to the time the templates get included and print them all at once.

That would certainly make my use case simpler, and I'm selfish enough to think that's important. ;) Seems like a nocache flag would be needed, would that be stored on $wp_query too?

#16 @scribu
14 years ago

  • Cc scribu@… added

#17 @scribu
14 years ago

  • Milestone set to Future Release

#18 in reply to: ↑ 14 @mikeschinkel
14 years ago

Replying to simonwheatley:

Who's with me? :)

Clearly, I am. :)

Replying to filosofo:

simonwheatley, I agree that we could really use more control over what happens in the WP objects's main method, but I don't think this patch is the way to go. It's somewhat arbitrary and not for the most part how filters are used in WordPress.

I to agree with you maybe my patch isn't the best. Since I submitted it 7 months ago I've learned a tremendous amount more about how WordPress works and see that there may be better ways.

So at this point I'll say I'm really happy my ticket (and patch) recognized the problem and called attention to it, which is really the ultimate goal, isn't it. Now the thing to do is gather the best WordPress minds on Trac like you an others to figure out what is the best solution and solve this bugger!

#19 @nacin
12 years ago

  • Milestone Future Release deleted
  • Resolution set to duplicate
  • Status changed from reopened to closed

#20 @MikeSchinkel
8 years ago

And, here is the solution that got added into core; Yeah!

https://core.trac.wordpress.org/changeset/36629

Last edited 8 years ago by MikeSchinkel (previous) (diff)
Note: See TracTickets for help on using tickets.