#14348 closed enhancement (fixed)
If it's a HEAD request, stop after the head!
Reported by: | mitchoyoshitaka | Owned by: | markjaquith |
---|---|---|---|
Milestone: | 3.5 | Priority: | normal |
Severity: | normal | Version: | 3.0 |
Component: | Performance | Keywords: | has-patch 3.5-early needs-unit-tests commit |
Focuses: | Cc: |
Description
Right now, as far as I can tell, when a HTTP HEAD request is made against WordPress, we go ahead and produce the entire page content, only to have it ignored. Attached is a patch to repair this.
While not a huge issue, if you happen to receive lots of HEAD requests, this can make a large difference, as each HEAD request will be much faster and not waste precious resources.
I have tried to ensure that this patch, by being in send_headers, will produce a HEAD response functionally identical to the response header of GET/POST requests, as it should be.
Attachments (9)
Change History (60)
#2
@
14 years ago
- Keywords has-patch removed
I hope by, "Thanks for reporting" what you mean to say is "that in no way should this patch ever be applied to WordPress and thanks for duplicating something hakre has already bought attention and I believe has a patch for what is appropriate to fixing the issue you have."
#3
@
14 years ago
- Keywords has-patch added
Oops, seemed to have lost my mind sometime a few minutes ago. I'm just going to be other there if you need me. No? Okay then.
#4
follow-ups:
↓ 5
↓ 10
↓ 14
@
14 years ago
I can easily picture a plugin overriding headers after an output buffer gets started or something similar. The patch exits too early.
I suspect anything short of output buffering, running the full WP, and discarding the output, will introduce potential issues. Alternatively we could close as wontfix or worksforme, since that would be what apache does already.
#5
in reply to:
↑ 4
;
follow-up:
↓ 6
@
14 years ago
Replying to Denis-de-Bernardy:
I can easily picture a plugin overriding headers after an output buffer gets started or something similar. The patch exits too early.
I thought the same thing, but I was wrong. The exit()ing is not part of the patch, but already in the code. Also, there is a filter, called 'wp_headers' that a plugin can hook into.
#6
in reply to:
↑ 5
;
follow-up:
↓ 9
@
14 years ago
Replying to jacobsantos:
Replying to Denis-de-Bernardy:
I can easily picture a plugin overriding headers after an output buffer gets started or something similar. The patch exits too early.
I thought the same thing, but I was wrong. The exit()ing is not part of the patch, but already in the code. Also, there is a filter, called 'wp_headers' that a plugin can hook into.
Yes, but that exit call, if i'm recognizing the code above and below, is there to handle 304 / not modified requests.
The wp_headers filter isn't of much use in practice. I can't recall the specific reasons, but I remember firmly dismissing it as useless for anything but unconditionally adding a header.
#7
follow-up:
↓ 8
@
14 years ago
A browser might send a HEAD request in order to determine whether or not the content in cache needs to be updated. If that is the case, then it is the same as exit()ing for the Not modified.
#8
in reply to:
↑ 7
@
14 years ago
Replying to jacobsantos:
A browser might send a HEAD request in order to determine whether or not the content in cache needs to be updated. If that is the case, then it is the same as exit()ing for the Not modified.
Indeed. This is the primary use of HEAD, and it's precisely why, if have a WP page which is oft linked to and crawlers want to check the status of, it will get hit hard. Caching helps, but WP itself should exit when it's not necessary to continue.
The reason I ran into this is because my Yet Another Related Posts Plugin page on my website gets linked to in a lot of RSS feeds, which are in turn handled by FeedBurner, and FeedBurner likes to check to see that linked-to websites have been updated or not (apparently). I thus get almost 3000 HEAD hits a day on this page... more than the number of GETs.
Since instituting this patch on my site, the CPU usage (across Media Temple's distributed "grid", so it says GPU = grid performance unit) of the script went down to two thirds that of producing the full GET response. Before the change they were identical. Yes, I am running WP Super Cache.
For me, Media Temple sometimes charges by the GPU. For others, this could translate directly into CPU and RAM usage. This is clearly a performance issue.
#9
in reply to:
↑ 6
@
14 years ago
Replying to Denis-de-Bernardy:
The wp_headers filter isn't of much use in practice. I can't recall the specific reasons, but I remember firmly dismissing it as useless for anything but unconditionally adding a header.
If there's a problem with wp_headers, then wp_headers should be fixed/improved/moved. I for one have used it with success.
#10
in reply to:
↑ 4
@
14 years ago
- Cc mitchoyoshitaka added
Replying to Denis-de-Bernardy:
I can easily picture a plugin overriding headers after an output buffer gets started or something similar. The patch exits too early.
I updated the patch to include a boolean filter, exit_on_http_head, so that plugins can turn this mechanism off if there is a problem.
If I were a betting man, I would bet that, out of the plugins which output a HTTP header, the subset which particularly care about the integrity of that header when it's a HEAD request, not just when it's a GET or POST, is very small.
#11
@
14 years ago
- Keywords 3.2-early added
- Milestone changed from Awaiting Review to Future Release
- Type changed from defect (bug) to enhancement
#13
@
14 years ago
Just thinking out loud here, but it seems to me that you should be able to stop the process at the point of template_include for a HEAD request. Everything that is going to set a header should have set it by this point, because once you include the template, you're in producing-output territory and headers can't be sent any more.
#14
in reply to:
↑ 4
@
14 years ago
- Keywords close added
Replying to Denis-de-Bernardy:
I can easily picture a plugin overriding headers after an output buffer gets started or something similar. The patch exits too early.
True output buffers will flush on exit.
I suspect anything short of output buffering, running the full WP, and discarding the output, will introduce potential issues. Alternatively we could close as wontfix or worksforme, since that would be what apache does already.
Confirmed.
@
13 years ago
Following Otto's approach, this patch adds a new function httphead() that hooks on template_include and returns false if it's a HEAD request. This is immediately useful for us, but could also be plugin territory so... grain, salt.
#16
@
13 years ago
Otto, technosailor - looks good, but the template_include approach doesn't affect do_feed. We should also block feed generation if it's a HEAD request.
I think this is particularly important as feed aggregators, etc., frequently ping feeds with HEAD requests.
#17
@
13 years ago
Please state your objections to 14348.diff. I'm not convinced that potential output buffering is something to be concerned with.
Unless someone can come up with a solution that leverages template_redirect, I don't think 14348-3.diff is the right approach.
#19
@
13 years ago
- Keywords 3.3-early added; close removed
- Milestone changed from Future Release to 3.3
If this is done, it should be done early in the cycle.
#21
@
13 years ago
- Keywords 2nd-opinion added
It's not only feeds, but do_robots() (called for robots.txt) that get a lot of HEAD requests. On this count either attachment:14348.diff or template_redirect seem better than template_include.
I tested using Fiddler to send HEAD requests & compare the two that "seem better".
Any change on this ticket will require some plugin updates, but template_redirect seems like the option with better workaround vs attachment:14348.diff
Feed Tests
attachment:14348.diff doesn't respond with the expected 302 redirect in most feed plugins, instead it sends a 200. Then the time in the 200 response to HEAD doesn't match the ones in a follow-up 302 > 200 sequence sent to a GET.
- BTW, I checked 5 higher-rated ones & they mainly seem to hook send_headers or template_redirect
- If we use template_redirect then send_headers could be hooked to fix any broken plugins. With the other option, any feed plugins would need to negate this whole tweak with $exit_on_http_head
- Also tested the old Feedburner FeedSmith. It breaks because it hooks template_redirect
- This plugin is still out there semi-frequently & isn't likely to get updated either
Robots Tests
I also tested several robots.txt plugins and they all work as expected with either attachment:14348.diff or template_redirect.
- The only odd thing was one plugin that logs the requester/bot from do_robot. Currently it logs both on HEAD & GET requests.
- Either of the performant fixes limits logging to only GETs (but that's what I personally would have expected).
(FYI that I did this testing on a 3.2.1 setup. Robots.txt requests failed with a 404 to either GET/HEAD, from [18776], but I wasn't sure if that was due to the plugins not being 3.3 compatible yet. Could someone else confirm if this might be a bug in do_robots()?)
- I opened #18841 in regards to the 404 for robots.txt, ignore this side-track here
Caching tests
Caching plugins do play here too, but the two biggest aren't impacted. They both handle HEAD requests in their own ways:
- One responds directly to HEAD based on cached files or an internal GET & won't break on either change
- The other has an odd execution path (& I was too lazy to follow it) but in testing it responded to repeat GET/HEAD requests fine whether the initially cached request was a HEAD or GET
#22
@
13 years ago
I like how simple Mitcho's method is. Just couldn't figure out how to tie it to template_redirect without making that action a filter. I didn't know how turning template_redirect into a filter would turn out, so this runs just after.
I also added a new constant that would let someone disable this if-required. Probably overkill.
Benefits: Maintains handling of canonical links, shortlinks, & old slugs. Keeps possible SEO benefits of those ... which might be useful since search spiders are heavy on HEAD use. This may not be worthwhile, wish I knew more if redirecting a HEAD is useful.
#25
@
13 years ago
- Keywords 3.4-early added; 3.3-early removed
- Milestone changed from 3.3 to Future Release
#26
@
13 years ago
This ticket has 3.4-early, after getting 3.2-early and 3.3-early in the past. Could this be taken in early in 3.4 to allow for broader testing? Pretty please?
#28
@
13 years ago
- Keywords 3.5-early added; 3.4-early removed
- Milestone changed from 3.4 to Future Release
Let's actually do this 3.5-early. Not touching this so late. Sorry, mitcho.
#29
follow-up:
↓ 33
@
12 years ago
- Milestone changed from Future Release to 3.5
What if we did this:
add_action( 'template_redirect', 'wp_exit_on_http_head', 1000 ); function wp_exit_on_http_head() { if ( 'HEAD' === $_SERVER['REQUEST_METHOD'] ) exit; }
Alternatively, 14348-4.diff but without the constant. Only the filter is really needed.
#31
@
12 years ago
I think a filter instead of an action will help caching plugins, but can't think of a specific use-case right now. Anyone have use cases? Otherwise @nacin's method is a bit more efficient.
Attaching my previous patch without the constant...
#32
@
12 years ago
Thoughts on landing this? My preference is for the filter in 14348-5.diff
, as stopping for HEAD requests should happen by default even if for whatever reason WP_USE_THEMES
is not being used. Nacin?
#33
in reply to:
↑ 29
;
follow-up:
↓ 34
@
12 years ago
- Cc kpayne@… added
Replying to nacin:
What if we did this:
add_action( 'template_redirect', 'wp_exit_on_http_head', 1000 ); function wp_exit_on_http_head() { if ( 'HEAD' === $_SERVER['REQUEST_METHOD'] ) exit; }
Can we make this testable? 'exit' is not catchable. Something like this?
add_action( 'template_redirect', 'wp_exit_on_http_head', 1000 ); function wp_exit_on_http_head() { if ( 'HEAD' === $_SERVER['REQUEST_METHOD'] ) { add_filter( 'wp_die_ajax_handler', '_scalar_wp_die_handler' ); wp_die(); } }
#34
in reply to:
↑ 33
;
follow-ups:
↓ 35
↓ 36
@
12 years ago
Replying to kurtpayne:
Can we make this testable? 'exit' is not catchable. Something like this?
I don't think adding another hook there is the right approach to testing this. Given the spirit of this ticket, the correct way to test this would be to send a head request against the install and verifying that the content body sent back is 0 bytes.
This can even be done right within WP using wp_remote_head()
.
#35
in reply to:
↑ 34
@
12 years ago
Replying to mitchoyoshitaka:
Given the spirit of this ticket, the correct way to test this would be to send a head request against the install and verifying that the content body sent back is 0 bytes.
The unit tests all run within phpunit and do not actually send requests through a web server*. This won't work as a unit test.
*Exception: the remote HTTP tests use api.wordpress.org, but this isn't the WordPress install that's being tested
#36
in reply to:
↑ 34
;
follow-up:
↓ 37
@
12 years ago
Replying to mitchoyoshitaka:
I don't think adding another hook there is the right approach to testing this...
it testing
Given the recent push to improve WordPress' unit tests, http://make.wordpress.org/core/handbook/automated-testing/, Kurt's suggestion makes a lot of sense.
I still prefer your/my method that adds a filter, but wp_die() makes sense. Uploading a blended patch next (testing using Mitcho's manual method & Fiddler).
#37
in reply to:
↑ 36
;
follow-up:
↓ 38
@
12 years ago
kurtpayne, mbijon: I'm really really confused by this approach:
add_filter( 'wp_die_ajax_handler', '_scalar_wp_die_handler' );
a) The filter will only run if we're testing in AJAX mode, but we wouldn't want this to only be testable as an AJAX request. This should be for all kinds of requests.
b) When wp_die()
runs and hits the filter (assuming this is in AJAX mode) it'll run _scalar_wp_die_handler
which will immediately die
. I don't think this is what's intended in order to make this testable.
Moreover, by calling wp_die()
instead of exit()
, it will by default print out the standard wp_die()
message HTML (and may even try to return status 500), which is precisely *not* what we want to do when dying early on a HEAD
request. If you want to use wp_die()
for some reason, you'd need to modify wp_die()
to set a handler precisely for this case or somesuch.
In conclusion: I think using exit()
is the right approach. You can use 14348-5.diff and hook against exit_on_http_head
for tests. exit_on_http_head
will only be triggered if you're testing a HEAD
request. You could make sure no output is queued at that point and that the return value is set up to be true.
#38
in reply to:
↑ 37
;
follow-up:
↓ 39
@
12 years ago
Replying to mitchoyoshitaka:
kurtpayne, mbijon: I'm really really confused by this approach:
add_filter( 'wp_die_ajax_handler', '_scalar_wp_die_handler' );
That's a typo. Good catch!
Take a look at the updated patch and associated unit test.
This also has the side effect of creating an action (head_request) that can be reused in plugins.
#39
in reply to:
↑ 38
;
follow-ups:
↓ 40
↓ 43
@
12 years ago
Replying to kurtpayne:
Take a look at the updated patch and associated unit test.
This also has the side effect of creating an action (head_request) that can be reused in plugins.
Thanks Kurt! Two thoughts:
- I don't think the action
head_request
and triggering it in the tests is the best way to test this. Why not simply set$_SERVER['REQUEST_METHOD'] = 'HEAD'
and check that it ends with no input? The semantics ofhead_request
are also odd: it's triggered even if it's not a HEAD request? - Using
wp_die
, again, means that unwanted output will be printed, in this case for AJAX and API requests. (wp_die_handler
filter is only used if it's not AJAX or API request.) Again, simply usingexit()
(and catching right before it for tests) is the right approach, as I explained above.
#40
in reply to:
↑ 39
@
12 years ago
Replying to mitchoyoshitaka:
I don't think the action
head_request
and triggering it in the tests is the best way to test this. Why not simply set$_SERVER['REQUEST_METHOD'] = 'HEAD'
and check that it ends with no input?
Sorry, nevermind. That is essentially what you're doing. I spoke too early.
Is wrapping this code in an action in order to check just its behavior necessary for the test? This isn't a realistic test for checking that no output is printed, then, as it doesn't actually run all the code that would happen in a normal request before getting to the head_request
point. Or maybe I'm missing something?
#41
@
12 years ago
I just tested attachment:14348-7.diff with Fiddler & the test suite. Both ways it worked as expected.
The do_action()
is in the correct place, it's executed at the same point as in my previous attachment:14348-6.diff. The distribution of code makes this a bit spaghetti'ish but shouldn't have any negative effects.
Putting spaghetti aside, the action does seem like it's unnecessary. Such an abstraction is a positive thing for code that might be run in multiple places, but I can't think of any reasons to call handle_head_request()
from a plugin that wouldn't already be addressed by the exit_on_http_head
filter & Kurt's added wp_die_handler
. Anyone else?
#42
@
12 years ago
The code is split up like that because the unit test has no other way to trigger a "die early" request. Keep in mind that there is no way to fire a complete HTTP request in he unit tests. Th run on the command line. No web server involved.
#43
in reply to:
↑ 39
@
12 years ago
@kurtpayne, I think the command line is irrelevant to request type in testing. Unit tests mock browser requests. The test suite creates mocked GET and POST requests from the command line without added actions for each possible request. HEAD shouldn't be any different.
THE PROBLEM: ... with WP testing seems to be that not all our verbs are built yet. We only have tests/post.php and tests/query.php for, respectively, POST and GET (& not 100% in query.php). And, note there's no head.php at all yet.
POSSIBLE FIX: WP has wp_remote_head()
built just for generating HEAD requests. So instead of adding the extra do_action( 'head_request' )
we should be able to generate a HEAD request with wp_remote_head( $url )
(check how it's done in tests/http/functions.php).
I think a switch to wp_remote_head() in your test would create a more realistic test of what HEAD does, remove the need for spaghetti, and mean we can move the body of your handle_head_request()
back into template-loader.php.
@mitcho, would that fix your 1st concern in comment 39?
#44
follow-up:
↓ 46
@
12 years ago
- Keywords 2nd-opinion removed
Just talked to @_mfields about this at #wcpdx and we think attachment:14348-5.diff will work. Using wp_remote_head()
in the unit test should mean that simpler code is enough.
(Michael did suggest we use die()
instead of exit()
, but other than that I'm removing 2nd-opinion on this)
#46
in reply to:
↑ 44
@
12 years ago
Replying to mbijon:
Just talked to @_mfields about this at #wcpdx and we think attachment:14348-5.diff will work. Using
wp_remote_head()
in the unit test should mean that simpler code is enough.
Wonderful!
#47
follow-up:
↓ 48
@
12 years ago
- Keywords commit added
14348-5.diff seems good. Want to clear with ryan first.
#48
in reply to:
↑ 47
@
12 years ago
Replying to nacin:
14348-5.diff seems good. Want to clear with ryan first.
WPCS update: I talked to ryan in person; he said this (the ticket, not necessarily this particular patch) was a good idea and he wants it to happen, and will look into it soon. Ryan: reminder to look into this soon. ;)
Thanks for reporting.