Ticket #14348 (new enhancement)
If it's a HEAD request, stop after the head!
| Reported by: |
|
Owned by: | |
|---|---|---|---|
| Priority: | normal | Milestone: | Future Release |
| Component: | Performance | Version: | 3.0 |
| Severity: | normal | Keywords: | has-patch 3.5-early 2nd-opinion |
| Cc: | mitchoyoshitaka, aaron@…, mbijon, solaris.smoke@…, neo@… |
Description
Right now, as far as I can tell, when a HTTP HEAD request is made against WordPress, we go ahead and produce the entire page content, only to have it ignored. Attached is a patch to repair this.
While not a huge issue, if you happen to receive lots of HEAD requests, this can make a large difference, as each HEAD request will be much faster and not waste precious resources.
I have tried to ensure that this patch, by being in send_headers, will produce a HEAD response functionally identical to the response header of GET/POST requests, as it should be.
Attachments
Change History
comment:2
jacobsantos — 23 months ago
- Keywords has-patch removed
I hope by, "Thanks for reporting" what you mean to say is "that in no way should this patch ever be applied to WordPress and thanks for duplicating something hakre has already bought attention and I believe has a patch for what is appropriate to fixing the issue you have."
comment:3
jacobsantos — 23 months ago
- Keywords has-patch added
Oops, seemed to have lost my mind sometime a few minutes ago. I'm just going to be other there if you need me. No? Okay then.
I can easily picture a plugin overriding headers after an output buffer gets started or something similar. The patch exits too early.
I suspect anything short of output buffering, running the full WP, and discarding the output, will introduce potential issues. Alternatively we could close as wontfix or worksforme, since that would be what apache does already.
comment:5
in reply to:
↑ 4
;
follow-up:
↓ 6
jacobsantos — 22 months ago
Replying to Denis-de-Bernardy:
I can easily picture a plugin overriding headers after an output buffer gets started or something similar. The patch exits too early.
I thought the same thing, but I was wrong. The exit()ing is not part of the patch, but already in the code. Also, there is a filter, called 'wp_headers' that a plugin can hook into.
comment:6
in reply to:
↑ 5
;
follow-up:
↓ 9
Denis-de-Bernardy — 22 months ago
Replying to jacobsantos:
Replying to Denis-de-Bernardy:
I can easily picture a plugin overriding headers after an output buffer gets started or something similar. The patch exits too early.
I thought the same thing, but I was wrong. The exit()ing is not part of the patch, but already in the code. Also, there is a filter, called 'wp_headers' that a plugin can hook into.
Yes, but that exit call, if i'm recognizing the code above and below, is there to handle 304 / not modified requests.
The wp_headers filter isn't of much use in practice. I can't recall the specific reasons, but I remember firmly dismissing it as useless for anything but unconditionally adding a header.
comment:7
follow-up:
↓ 8
jacobsantos — 22 months ago
A browser might send a HEAD request in order to determine whether or not the content in cache needs to be updated. If that is the case, then it is the same as exit()ing for the Not modified.
comment:8
in reply to:
↑ 7
mitchoyoshitaka — 22 months ago
Replying to jacobsantos:
A browser might send a HEAD request in order to determine whether or not the content in cache needs to be updated. If that is the case, then it is the same as exit()ing for the Not modified.
Indeed. This is the primary use of HEAD, and it's precisely why, if have a WP page which is oft linked to and crawlers want to check the status of, it will get hit hard. Caching helps, but WP itself should exit when it's not necessary to continue.
The reason I ran into this is because my Yet Another Related Posts Plugin page on my website gets linked to in a lot of RSS feeds, which are in turn handled by FeedBurner, and FeedBurner likes to check to see that linked-to websites have been updated or not (apparently). I thus get almost 3000 HEAD hits a day on this page... more than the number of GETs.
Since instituting this patch on my site, the CPU usage (across Media Temple's distributed "grid", so it says GPU = grid performance unit) of the script went down to two thirds that of producing the full GET response. Before the change they were identical. Yes, I am running WP Super Cache.
For me, Media Temple sometimes charges by the GPU. For others, this could translate directly into CPU and RAM usage. This is clearly a performance issue.
mitchoyoshitaka — 22 months ago
-
attachment
14348.diff
added
patch (version 2), added exit_on_http_head boolean filter
comment:9
in reply to:
↑ 6
mitchoyoshitaka — 22 months ago
Replying to Denis-de-Bernardy:
The wp_headers filter isn't of much use in practice. I can't recall the specific reasons, but I remember firmly dismissing it as useless for anything but unconditionally adding a header.
If there's a problem with wp_headers, then wp_headers should be fixed/improved/moved. I for one have used it with success.
comment:10
in reply to:
↑ 4
mitchoyoshitaka — 22 months ago
- Cc mitchoyoshitaka added
Replying to Denis-de-Bernardy:
I can easily picture a plugin overriding headers after an output buffer gets started or something similar. The patch exits too early.
I updated the patch to include a boolean filter, exit_on_http_head, so that plugins can turn this mechanism off if there is a problem.
If I were a betting man, I would bet that, out of the plugins which output a HTTP header, the subset which particularly care about the integrity of that header when it's a HEAD request, not just when it's a GET or POST, is very small.
comment:11
nacin — 18 months ago
- Keywords 3.2-early added
- Type changed from defect (bug) to enhancement
- Milestone changed from Awaiting Review to Future Release
comment:12
scribu — 14 months ago
- Keywords 3.2-early removed
- Milestone changed from Future Release to 3.2
comment:13
Otto42 — 14 months ago
Just thinking out loud here, but it seems to me that you should be able to stop the process at the point of template_include for a HEAD request. Everything that is going to set a header should have set it by this point, because once you include the template, you're in producing-output territory and headers can't be sent any more.
comment:14
in reply to:
↑ 4
hakre — 14 months ago
- Keywords close added
Replying to Denis-de-Bernardy:
I can easily picture a plugin overriding headers after an output buffer gets started or something similar. The patch exits too early.
True output buffers will flush on exit.
I suspect anything short of output buffering, running the full WP, and discarding the output, will introduce potential issues. Alternatively we could close as wontfix or worksforme, since that would be what apache does already.
Confirmed.
technosailor — 13 months ago
-
attachment
14348-2.diff
added
Following Otto's approach, this patch adds a new function httphead() that hooks on template_include and returns false if it's a HEAD request. This is immediately useful for us, but could also be plugin territory so... grain, salt.
Otto, technosailor - looks good, but the template_include approach doesn't affect do_feed. We should also block feed generation if it's a HEAD request.
I think this is particularly important as feed aggregators, etc., frequently ping feeds with HEAD requests.
comment:17
nacin — 12 months ago
Please state your objections to 14348.diff. I'm not convinced that potential output buffering is something to be concerned with.
Unless someone can come up with a solution that leverages template_redirect, I don't think 14348-3.diff is the right approach.
comment:19
koopersmith — 10 months ago
- Keywords 3.3-early added; close removed
- Milestone changed from Future Release to 3.3
If this is done, it should be done early in the cycle.
comment:21
mbijon — 8 months ago
- Keywords 2nd-opinion added
It's not only feeds, but do_robots() (called for robots.txt) that get a lot of HEAD requests. On this count either attachment:14348.diff or template_redirect seem better than template_include.
I tested using Fiddler to send HEAD requests & compare the two that "seem better".
Any change on this ticket will require some plugin updates, but template_redirect seems like the option with better workaround vs attachment:14348.diff
Feed Tests
attachment:14348.diff doesn't respond with the expected 302 redirect in most feed plugins, instead it sends a 200. Then the time in the 200 response to HEAD doesn't match the ones in a follow-up 302 > 200 sequence sent to a GET.
- BTW, I checked 5 higher-rated ones & they mainly seem to hook send_headers or template_redirect
- If we use template_redirect then send_headers could be hooked to fix any broken plugins. With the other option, any feed plugins would need to negate this whole tweak with $exit_on_http_head
- Also tested the old Feedburner FeedSmith. It breaks because it hooks template_redirect
- This plugin is still out there semi-frequently & isn't likely to get updated either
Robots Tests
I also tested several robots.txt plugins and they all work as expected with either attachment:14348.diff or template_redirect.
- The only odd thing was one plugin that logs the requester/bot from do_robot. Currently it logs both on HEAD & GET requests.
- Either of the performant fixes limits logging to only GETs (but that's what I personally would have expected).
(FYI that I did this testing on a 3.2.1 setup. Robots.txt requests failed with a 404 to either GET/HEAD, from [18776], but I wasn't sure if that was due to the plugins not being 3.3 compatible yet. Could someone else confirm if this might be a bug in do_robots()?)
- I opened #18841 in regards to the 404 for robots.txt, ignore this side-track here
Caching tests
Caching plugins do play here too, but the two biggest aren't impacted. They both handle HEAD requests in their own ways:
- One responds directly to HEAD based on cached files or an internal GET & won't break on either change
- The other has an odd execution path (& I was too lazy to follow it) but in testing it responded to repeat GET/HEAD requests fine whether the initially cached request was a HEAD or GET
-
attachment
14348-4.diff
added
Mitcho's method, runs after 'template_redirect' but without hooking on it
comment:22
mbijon — 8 months ago
I like how simple Mitcho's method is. Just couldn't figure out how to tie it to template_redirect without making that action a filter. I didn't know how turning template_redirect into a filter would turn out, so this runs just after.
I also added a new constant that would let someone disable this if-required. Probably overkill.
Benefits: Maintains handling of canonical links, shortlinks, & old slugs. Keeps possible SEO benefits of those ... which might be useful since search spiders are heavy on HEAD use. This may not be worthwhile, wish I knew more if redirecting a HEAD is useful.
comment:25
ryan — 7 months ago
- Keywords 3.4-early added; 3.3-early removed
- Milestone changed from 3.3 to Future Release
This ticket has 3.4-early, after getting 3.2-early and 3.3-early in the past. Could this be taken in early in 3.4 to allow for broader testing? Pretty please?
comment:28
nacin — 2 months ago
- Keywords 3.5-early added; 3.4-early removed
- Milestone changed from 3.4 to Future Release
Let's actually do this 3.5-early. Not touching this so late. Sorry, mitcho.


Thanks for reporting.