Make WordPress Core

Opened 14 years ago

Closed 12 years ago

Last modified 12 years ago

#14348 closed enhancement (fixed)

If it's a HEAD request, stop after the head!

Reported by: mitchoyoshitaka's profile mitchoyoshitaka Owned by: markjaquith's profile markjaquith
Milestone: 3.5 Priority: normal
Severity: normal Version: 3.0
Component: Performance Keywords: has-patch 3.5-early needs-unit-tests commit
Focuses: Cc:

Description

Right now, as far as I can tell, when a HTTP HEAD request is made against WordPress, we go ahead and produce the entire page content, only to have it ignored. Attached is a patch to repair this.

While not a huge issue, if you happen to receive lots of HEAD requests, this can make a large difference, as each HEAD request will be much faster and not waste precious resources.

I have tried to ensure that this patch, by being in send_headers, will produce a HEAD response functionally identical to the response header of GET/POST requests, as it should be.

Attachments (9)

14348.diff (464 bytes) - added by mitchoyoshitaka 14 years ago.
patch (version 2), added exit_on_http_head boolean filter
14348-2.diff (1.2 KB) - added by technosailor 13 years ago.
Following Otto's approach, this patch adds a new function httphead() that hooks on template_include and returns false if it's a HEAD request. This is immediately useful for us, but could also be plugin territory so... grain, salt.
14348-3.diff (1.3 KB) - added by technosailor 13 years ago.
Refreshed diff with docs
14348-4.diff (683 bytes) - added by mbijon 13 years ago.
Mitcho's method, runs after 'template_redirect' but without hooking on it
14348-5.diff (649 bytes) - added by mbijon 12 years ago.
Basically 14348-4.diff, just without the constant
14348-6.diff (720 bytes) - added by mbijon 12 years ago.
Updated to be phpunit compatible
14348-7.diff (1.7 KB) - added by kurtpayne 12 years ago.
Alternate patch
14348-unit-test.diff (2.2 KB) - added by kurtpayne 12 years ago.
Unit test
14348-8.diff (791 bytes) - added by mbijon 12 years ago.
Removed double action. Keeps corrected wp_die_handler

Download all attachments as: .zip

Change History (60)

#1 @hakre
14 years ago

Thanks for reporting.

#2 @jacobsantos
14 years ago

  • Keywords has-patch removed

I hope by, "Thanks for reporting" what you mean to say is "that in no way should this patch ever be applied to WordPress and thanks for duplicating something hakre has already bought attention and I believe has a patch for what is appropriate to fixing the issue you have."

#3 @jacobsantos
14 years ago

  • Keywords has-patch added

Oops, seemed to have lost my mind sometime a few minutes ago. I'm just going to be other there if you need me. No? Okay then.

#4 follow-ups: @Denis-de-Bernardy
14 years ago

I can easily picture a plugin overriding headers after an output buffer gets started or something similar. The patch exits too early.

I suspect anything short of output buffering, running the full WP, and discarding the output, will introduce potential issues. Alternatively we could close as wontfix or worksforme, since that would be what apache does already.

#5 in reply to: ↑ 4 ; follow-up: @jacobsantos
14 years ago

Replying to Denis-de-Bernardy:

I can easily picture a plugin overriding headers after an output buffer gets started or something similar. The patch exits too early.

I thought the same thing, but I was wrong. The exit()ing is not part of the patch, but already in the code. Also, there is a filter, called 'wp_headers' that a plugin can hook into.

#6 in reply to: ↑ 5 ; follow-up: @Denis-de-Bernardy
14 years ago

Replying to jacobsantos:

Replying to Denis-de-Bernardy:

I can easily picture a plugin overriding headers after an output buffer gets started or something similar. The patch exits too early.

I thought the same thing, but I was wrong. The exit()ing is not part of the patch, but already in the code. Also, there is a filter, called 'wp_headers' that a plugin can hook into.

Yes, but that exit call, if i'm recognizing the code above and below, is there to handle 304 / not modified requests.

The wp_headers filter isn't of much use in practice. I can't recall the specific reasons, but I remember firmly dismissing it as useless for anything but unconditionally adding a header.

#7 follow-up: @jacobsantos
14 years ago

A browser might send a HEAD request in order to determine whether or not the content in cache needs to be updated. If that is the case, then it is the same as exit()ing for the Not modified.

#8 in reply to: ↑ 7 @mitchoyoshitaka
14 years ago

Replying to jacobsantos:

A browser might send a HEAD request in order to determine whether or not the content in cache needs to be updated. If that is the case, then it is the same as exit()ing for the Not modified.

Indeed. This is the primary use of HEAD, and it's precisely why, if have a WP page which is oft linked to and crawlers want to check the status of, it will get hit hard. Caching helps, but WP itself should exit when it's not necessary to continue.

The reason I ran into this is because my Yet Another Related Posts Plugin page on my website gets linked to in a lot of RSS feeds, which are in turn handled by FeedBurner, and FeedBurner likes to check to see that linked-to websites have been updated or not (apparently). I thus get almost 3000 HEAD hits a day on this page... more than the number of GETs.

http://img.skitch.com/20100722-pjks2t95tnybupumwfek8u8qtd.jpg

Since instituting this patch on my site, the CPU usage (across Media Temple's distributed "grid", so it says GPU = grid performance unit) of the script went down to two thirds that of producing the full GET response. Before the change they were identical. Yes, I am running WP Super Cache.

For me, Media Temple sometimes charges by the GPU. For others, this could translate directly into CPU and RAM usage. This is clearly a performance issue.

@mitchoyoshitaka
14 years ago

patch (version 2), added exit_on_http_head boolean filter

#9 in reply to: ↑ 6 @mitchoyoshitaka
14 years ago

Replying to Denis-de-Bernardy:

The wp_headers filter isn't of much use in practice. I can't recall the specific reasons, but I remember firmly dismissing it as useless for anything but unconditionally adding a header.

If there's a problem with wp_headers, then wp_headers should be fixed/improved/moved. I for one have used it with success.

#10 in reply to: ↑ 4 @mitchoyoshitaka
14 years ago

  • Cc mitchoyoshitaka added

Replying to Denis-de-Bernardy:

I can easily picture a plugin overriding headers after an output buffer gets started or something similar. The patch exits too early.

I updated the patch to include a boolean filter, exit_on_http_head, so that plugins can turn this mechanism off if there is a problem.

If I were a betting man, I would bet that, out of the plugins which output a HTTP header, the subset which particularly care about the integrity of that header when it's a HEAD request, not just when it's a GET or POST, is very small.

#11 @nacin
14 years ago

  • Keywords 3.2-early added
  • Milestone changed from Awaiting Review to Future Release
  • Type changed from defect (bug) to enhancement

#12 @scribu
13 years ago

  • Keywords 3.2-early removed
  • Milestone changed from Future Release to 3.2

#13 @Otto42
13 years ago

Just thinking out loud here, but it seems to me that you should be able to stop the process at the point of template_include for a HEAD request. Everything that is going to set a header should have set it by this point, because once you include the template, you're in producing-output territory and headers can't be sent any more.

#14 in reply to: ↑ 4 @hakre
13 years ago

  • Keywords close added

Replying to Denis-de-Bernardy:

I can easily picture a plugin overriding headers after an output buffer gets started or something similar. The patch exits too early.

True output buffers will flush on exit.

I suspect anything short of output buffering, running the full WP, and discarding the output, will introduce potential issues. Alternatively we could close as wontfix or worksforme, since that would be what apache does already.

Confirmed.

@technosailor
13 years ago

Following Otto's approach, this patch adds a new function httphead() that hooks on template_include and returns false if it's a HEAD request. This is immediately useful for us, but could also be plugin territory so... grain, salt.

@technosailor
13 years ago

Refreshed diff with docs

#15 @technosailor
13 years ago

  • Cc aaron@… added

#16 @mitchoyoshitaka
13 years ago

Otto, technosailor - looks good, but the template_include approach doesn't affect do_feed. We should also block feed generation if it's a HEAD request.

I think this is particularly important as feed aggregators, etc., frequently ping feeds with HEAD requests.

#17 @nacin
13 years ago

Please state your objections to 14348.diff. I'm not convinced that potential output buffering is something to be concerned with.

Unless someone can come up with a solution that leverages template_redirect, I don't think 14348-3.diff is the right approach.

#18 @ryan
13 years ago

  • Milestone changed from 3.2 to Future Release

#19 @koopersmith
13 years ago

  • Keywords 3.3-early added; close removed
  • Milestone changed from Future Release to 3.3

If this is done, it should be done early in the cycle.

#20 @mbijon
13 years ago

  • Cc mbijon added

#21 @mbijon
13 years ago

  • Keywords 2nd-opinion added

It's not only feeds, but do_robots() (called for robots.txt) that get a lot of HEAD requests. On this count either attachment:14348.diff or template_redirect seem better than template_include.

I tested using Fiddler to send HEAD requests & compare the two that "seem better".

Any change on this ticket will require some plugin updates, but template_redirect seems like the option with better workaround vs attachment:14348.diff


Feed Tests

attachment:14348.diff doesn't respond with the expected 302 redirect in most feed plugins, instead it sends a 200. Then the time in the 200 response to HEAD doesn't match the ones in a follow-up 302 > 200 sequence sent to a GET.

  • BTW, I checked 5 higher-rated ones & they mainly seem to hook send_headers or template_redirect
    • If we use template_redirect then send_headers could be hooked to fix any broken plugins. With the other option, any feed plugins would need to negate this whole tweak with $exit_on_http_head
  • Also tested the old Feedburner FeedSmith. It breaks because it hooks template_redirect
    • This plugin is still out there semi-frequently & isn't likely to get updated either

Robots Tests

I also tested several robots.txt plugins and they all work as expected with either attachment:14348.diff or template_redirect.

  • The only odd thing was one plugin that logs the requester/bot from do_robot. Currently it logs both on HEAD & GET requests.
    • Either of the performant fixes limits logging to only GETs (but that's what I personally would have expected).

(FYI that I did this testing on a 3.2.1 setup. Robots.txt requests failed with a 404 to either GET/HEAD, from [18776], but I wasn't sure if that was due to the plugins not being 3.3 compatible yet. Could someone else confirm if this might be a bug in do_robots()?)

Caching tests

Caching plugins do play here too, but the two biggest aren't impacted. They both handle HEAD requests in their own ways:

  • One responds directly to HEAD based on cached files or an internal GET & won't break on either change
  • The other has an odd execution path (& I was too lazy to follow it) but in testing it responded to repeat GET/HEAD requests fine whether the initially cached request was a HEAD or GET
Version 0, edited 13 years ago by mbijon (next)

@mbijon
13 years ago

Mitcho's method, runs after 'template_redirect' but without hooking on it

#22 @mbijon
13 years ago

I like how simple Mitcho's method is. Just couldn't figure out how to tie it to template_redirect without making that action a filter. I didn't know how turning template_redirect into a filter would turn out, so this runs just after.

I also added a new constant that would let someone disable this if-required. Probably overkill.

Benefits: Maintains handling of canonical links, shortlinks, & old slugs. Keeps possible SEO benefits of those ... which might be useful since search spiders are heavy on HEAD use. This may not be worthwhile, wish I knew more if redirecting a HEAD is useful.

#23 @solarissmoke
13 years ago

  • Cc solaris.smoke@… added

#24 @neoxx
13 years ago

  • Cc neo@… added

#25 @ryan
13 years ago

  • Keywords 3.4-early added; 3.3-early removed
  • Milestone changed from 3.3 to Future Release

#26 @mitchoyoshitaka
13 years ago

This ticket has 3.4-early, after getting 3.2-early and 3.3-early in the past. Could this be taken in early in 3.4 to allow for broader testing? Pretty please?

#27 @SergeyBiryukov
13 years ago

  • Milestone changed from Future Release to 3.4

#28 @nacin
12 years ago

  • Keywords 3.5-early added; 3.4-early removed
  • Milestone changed from 3.4 to Future Release

Let's actually do this 3.5-early. Not touching this so late. Sorry, mitcho.

#29 follow-up: @nacin
12 years ago

  • Milestone changed from Future Release to 3.5

What if we did this:

add_action( 'template_redirect', 'wp_exit_on_http_head', 1000 );
function wp_exit_on_http_head() {
    if ( 'HEAD' === $_SERVER['REQUEST_METHOD'] )
        exit;
}

Alternatively, 14348-4.diff but without the constant. Only the filter is really needed.

#30 @sirzooro
12 years ago

  • Cc sirzooro added

#31 @mbijon
12 years ago

I think a filter instead of an action will help caching plugins, but can't think of a specific use-case right now. Anyone have use cases? Otherwise @nacin's method is a bit more efficient.

Attaching my previous patch without the constant...

@mbijon
12 years ago

Basically 14348-4.diff, just without the constant

#32 @mitchoyoshitaka
12 years ago

Thoughts on landing this? My preference is for the filter in 14348-5.diff, as stopping for HEAD requests should happen by default even if for whatever reason WP_USE_THEMES is not being used. Nacin?

#33 in reply to: ↑ 29 ; follow-up: @kurtpayne
12 years ago

  • Cc kpayne@… added

Replying to nacin:

What if we did this:

add_action( 'template_redirect', 'wp_exit_on_http_head', 1000 );
function wp_exit_on_http_head() {
    if ( 'HEAD' === $_SERVER['REQUEST_METHOD'] )
        exit;
}

Can we make this testable? 'exit' is not catchable. Something like this?

add_action( 'template_redirect', 'wp_exit_on_http_head', 1000 );
function wp_exit_on_http_head() {
	if ( 'HEAD' === $_SERVER['REQUEST_METHOD'] ) {
		add_filter( 'wp_die_ajax_handler', '_scalar_wp_die_handler' );
		wp_die();
	}
}

#34 in reply to: ↑ 33 ; follow-ups: @mitchoyoshitaka
12 years ago

Replying to kurtpayne:

Can we make this testable? 'exit' is not catchable. Something like this?

I don't think adding another hook there is the right approach to testing this. Given the spirit of this ticket, the correct way to test this would be to send a head request against the install and verifying that the content body sent back is 0 bytes.

This can even be done right within WP using wp_remote_head().

#35 in reply to: ↑ 34 @kurtpayne
12 years ago

Replying to mitchoyoshitaka:

Given the spirit of this ticket, the correct way to test this would be to send a head request against the install and verifying that the content body sent back is 0 bytes.

The unit tests all run within phpunit and do not actually send requests through a web server*. This won't work as a unit test.

*Exception: the remote HTTP tests use api.wordpress.org, but this isn't the WordPress install that's being tested

Last edited 12 years ago by kurtpayne (previous) (diff)

#36 in reply to: ↑ 34 ; follow-up: @mbijon
12 years ago

Replying to mitchoyoshitaka:

I don't think adding another hook there is the right approach to testing this...

it testing
Given the recent push to improve WordPress' unit tests, http://make.wordpress.org/core/handbook/automated-testing/, Kurt's suggestion makes a lot of sense.

I still prefer your/my method that adds a filter, but wp_die() makes sense. Uploading a blended patch next (testing using Mitcho's manual method & Fiddler).

@mbijon
12 years ago

Updated to be phpunit compatible

#37 in reply to: ↑ 36 ; follow-up: @mitchoyoshitaka
12 years ago

kurtpayne, mbijon: I'm really really confused by this approach:

add_filter( 'wp_die_ajax_handler', '_scalar_wp_die_handler' );

a) The filter will only run if we're testing in AJAX mode, but we wouldn't want this to only be testable as an AJAX request. This should be for all kinds of requests.

b) When wp_die() runs and hits the filter (assuming this is in AJAX mode) it'll run _scalar_wp_die_handler which will immediately die. I don't think this is what's intended in order to make this testable.

Moreover, by calling wp_die() instead of exit(), it will by default print out the standard wp_die() message HTML (and may even try to return status 500), which is precisely *not* what we want to do when dying early on a HEAD request. If you want to use wp_die() for some reason, you'd need to modify wp_die() to set a handler precisely for this case or somesuch.

In conclusion: I think using exit() is the right approach. You can use 14348-5.diff and hook against exit_on_http_head for tests. exit_on_http_head will only be triggered if you're testing a HEAD request. You could make sure no output is queued at that point and that the return value is set up to be true.

@kurtpayne
12 years ago

Alternate patch

@kurtpayne
12 years ago

Unit test

#38 in reply to: ↑ 37 ; follow-up: @kurtpayne
12 years ago

Replying to mitchoyoshitaka:

kurtpayne, mbijon: I'm really really confused by this approach:

add_filter( 'wp_die_ajax_handler', '_scalar_wp_die_handler' );

That's a typo. Good catch!

Take a look at the updated patch and associated unit test.

This also has the side effect of creating an action (head_request) that can be reused in plugins.

#39 in reply to: ↑ 38 ; follow-ups: @mitchoyoshitaka
12 years ago

Replying to kurtpayne:

Take a look at the updated patch and associated unit test.

This also has the side effect of creating an action (head_request) that can be reused in plugins.

Thanks Kurt! Two thoughts:

  1. I don't think the action head_request and triggering it in the tests is the best way to test this. Why not simply set $_SERVER['REQUEST_METHOD'] = 'HEAD' and check that it ends with no input? The semantics of head_request are also odd: it's triggered even if it's not a HEAD request?
  2. Using wp_die, again, means that unwanted output will be printed, in this case for AJAX and API requests. (wp_die_handler filter is only used if it's not AJAX or API request.) Again, simply using exit() (and catching right before it for tests) is the right approach, as I explained above.
Last edited 12 years ago by mitchoyoshitaka (previous) (diff)

#40 in reply to: ↑ 39 @mitchoyoshitaka
12 years ago

Replying to mitchoyoshitaka:

I don't think the action head_request and triggering it in the tests is the best way to test this. Why not simply set $_SERVER['REQUEST_METHOD'] = 'HEAD' and check that it ends with no input?

Sorry, nevermind. That is essentially what you're doing. I spoke too early.

Is wrapping this code in an action in order to check just its behavior necessary for the test? This isn't a realistic test for checking that no output is printed, then, as it doesn't actually run all the code that would happen in a normal request before getting to the head_request point. Or maybe I'm missing something?

#41 @mbijon
12 years ago

I just tested attachment:14348-7.diff with Fiddler & the test suite. Both ways it worked as expected.

The do_action() is in the correct place, it's executed at the same point as in my previous attachment:14348-6.diff. The distribution of code makes this a bit spaghetti'ish but shouldn't have any negative effects.

Putting spaghetti aside, the action does seem like it's unnecessary. Such an abstraction is a positive thing for code that might be run in multiple places, but I can't think of any reasons to call handle_head_request() from a plugin that wouldn't already be addressed by the exit_on_http_head filter & Kurt's added wp_die_handler. Anyone else?

#42 @kurtpayne
12 years ago

The code is split up like that because the unit test has no other way to trigger a "die early" request. Keep in mind that there is no way to fire a complete HTTP request in he unit tests. Th run on the command line. No web server involved.

#43 in reply to: ↑ 39 @mbijon
12 years ago

@kurtpayne, I think the command line is irrelevant to request type in testing. Unit tests mock browser requests. The test suite creates mocked GET and POST requests from the command line without added actions for each possible request. HEAD shouldn't be any different.

THE PROBLEM: ... with WP testing seems to be that not all our verbs are built yet. We only have tests/post.php and tests/query.php for, respectively, POST and GET (& not 100% in query.php). And, note there's no head.php at all yet.

POSSIBLE FIX: WP has wp_remote_head() built just for generating HEAD requests. So instead of adding the extra do_action( 'head_request' ) we should be able to generate a HEAD request with wp_remote_head( $url ) (check how it's done in tests/http/functions.php).

I think a switch to wp_remote_head() in your test would create a more realistic test of what HEAD does, remove the need for spaghetti, and mean we can move the body of your handle_head_request() back into template-loader.php.

@mitcho, would that fix your 1st concern in comment 39?

@mbijon
12 years ago

Removed double action. Keeps corrected wp_die_handler

#44 follow-up: @mbijon
12 years ago

  • Keywords 2nd-opinion removed

Just talked to @_mfields about this at #wcpdx and we think attachment:14348-5.diff will work. Using wp_remote_head() in the unit test should mean that simpler code is enough.

(Michael did suggest we use die() instead of exit(), but other than that I'm removing 2nd-opinion on this)

#45 @kurtpayne
12 years ago

  • Keywords needs-unit-tests added

Can you submit a patch for the unit test?

#46 in reply to: ↑ 44 @mitchoyoshitaka
12 years ago

Replying to mbijon:

Just talked to @_mfields about this at #wcpdx and we think attachment:14348-5.diff will work. Using wp_remote_head() in the unit test should mean that simpler code is enough.

Wonderful!

#47 follow-up: @nacin
12 years ago

  • Keywords commit added

14348-5.diff seems good. Want to clear with ryan first.

#48 in reply to: ↑ 47 @mitchoyoshitaka
12 years ago

Replying to nacin:

14348-5.diff seems good. Want to clear with ryan first.

WPCS update: I talked to ryan in person; he said this (the ticket, not necessarily this particular patch) was a good idea and he wants it to happen, and will look into it soon. Ryan: reminder to look into this soon. ;)

#49 @markjaquith
12 years ago

  • Owner set to markjaquith
  • Status changed from new to accepted

Just got another thumbs up from ryan.

#50 @markjaquith
12 years ago

  • Resolution set to fixed
  • Status changed from accepted to closed

In 22347:

When receiving a HEAD request, die() right before we start outputting content.

  • Content doesn't make sense for HEAD requests
  • Saves wear and tear on the server
  • Introduces a filter: exit_on_http_head (we send TRUE through it)

fixes #14348. props mbijon, mitchoyoshitaka.

#51 @markjaquith
12 years ago

In 22353:

Clean up [22347] a bit. see #14348

Note: See TracTickets for help on using tickets.