WordPress.org

Make WordPress Core

Opened 4 years ago

Last modified 3 years ago

#13066 accepted defect (bug)

Last-Modified headers for individual comment feeds are incorrect

Reported by: solarissmoke Owned by: jgci
Milestone: Future Release Priority: low
Severity: normal Version: 3.0
Component: Feeds Keywords: dev-feedback gci has-patch 3.2-early
Focuses: Cc:

Description

The WP::send_headers function currently uses get_lastcommentmodified() to set the Last-Modified header for all comment feeds. This is a problem when used for individual post comment feeds. The function gets the last modified comment across all blog posts. That means that every time a comment is posted anywhere, the Last-Modified header for ALL comment feeds is refreshed. Issues:

  1. This is technically incorrect, since only the global comment feed and one specific post's comment feed have changed with the last comment (not all possible comment feeds); and
  1. It means that If-Modified-Since requests for other post comment feeds will not receive a 304 response when they should do (since their content hasn't changed). On blogs with many posts and many comment feeds, this will have a large impact on bandwidth because lots of requests will receive 200 responses where 304's would have done, just because a comment was posted on some other post.

If I've understood the flow correctly, $wp_query hasn't been fully set up at the time this function is called, so changing this behaviour would require some change in the flow of things (e.g., the handling of last modified headers for feeds moves into the do_feed() function). But doing so would mean that Last-Modified headers are correct/meaningful and that many more 304 responses can be served.

Any thoughts?

Attachments (2)

get_lastcommentmodified.diff (3.5 KB) - added by jgci 3 years ago.
query.diff (734 bytes) - added by jgci 3 years ago.

Download all attachments as: .zip

Change History (11)

comment:1 solarissmoke4 years ago

Incidentally, the <lastBuildDate> tag in the individual comment feeds is also wrong because of the same use of get_lastcommentmodified().

comment:2 follow-up: dd324 years ago

  • Milestone changed from 3.0 to Future Release

This code is run early in the execution of WordPress, At the point in time its run, All WordPress knows is, that its a query for a Comments feed.

Its a trade off between bandwidth and server load, continuing to process the request will require loading the rest of WordPress & parsing the request & database fetches. all of which, would be pointless if no comments have been posted.

At the same time, it wastes bandwidth in preference to wasting CPU cycles as feeds are re-served often when they're not needed to.

Theres a few options here:

  • Leave as-is
  • Make the code here run AFTER posts are queried, which would result in higher CPU load, yet more accurate result
  • Make the comment check specify the post for the requests using a manual SQL
    • for p, page_id, and attachment_id requests, this wouldnt require an extra SQL
    • for name and attachment requests, this would require an extra SQL query
    • for pagename requests, this would probably require a few more queries

Since this isnt a regression(and we're trying to get 3.0 done), I'm setting this to Future Release pending agreement and/or a patch.

comment:3 in reply to: ↑ 2 solarissmoke4 years ago

Replying to dd32:

Its a trade off between bandwidth and server load, continuing to process the request will require loading the rest of WordPress & parsing the request & database fetches. all of which, would be pointless if no comments have been posted.

True, but every time Wordpress sends a 200 response where 304 would have done, we end up wasting both bandwidth and server load (because a full load takes place anyway).

  • Leave as-is

I would suggest that even for semantic reasons the Last-Modified information should reflect the state of the content being served (the 304 issue aside).

  • Make the comment check specify the post for the requests using a manual SQL

As you say this would be a little messy for pagename requests etc. The easiest option would be to do it after everything has loaded, at the expense of server load.

I was also thinking whether it might be possible to cache (in the db) the last-modified data for feeds, so that this could be accessed early in the execution? That might be even more messy though.

comment:4 hakre4 years ago

Looks like there is no intention to fix this. Close as wontfix?

comment:5 westi3 years ago

  • Cc westi added
  • Keywords gci added

I have proposed this as GCI2010 Task

comment:6 jgci3 years ago

  • Owner set to jgci
  • Status changed from new to accepted

comment:7 jgci3 years ago

  • Keywords has-patch added

comment:8 westi3 years ago

  • Keywords 3.2-early added

The patch resolves the issue in my testing.

Not 100% sure this is the perfect solution but definitely a working one.

jgci3 years ago

comment:9 jgci3 years ago

I've just added a secound patch, which makes sure that one query isn't parsed twice.

Note: See TracTickets for help on using tickets.