Make WordPress Core

Opened 11 years ago

Closed 9 years ago

Last modified 8 years ago

#29201 closed enhancement (wontfix)

File versioning should not use query strings, but rename the filename to allow caching

Reported by: benoitchantre's profile benoitchantre Owned by:
Milestone: Priority: normal
Severity: normal Version: 3.9.1
Component: Script Loader Keywords:
Focuses: performance Cc:

Description (last modified by SergeyBiryukov)

Most proxies, most notably Squid up through version 3.0, do not cache resources with a "?" in their URL even if a Cache-control: public header is present in the response. To enable proxy caching for these resources, remove query strings from references to static resources, and instead encode the parameters into the file names themselves.
http://gtmetrix.com/remove-query-strings-from-static-resources.html

wp_enqueue_style and wp_enqueue_script have a parameter $ver to specify a version number, but it adds a query string in the returned link.

It would be better to rename the filename
style.foo.css instead of style.css?v=foo

Another detailed info about that: (this link can be found in the htaccess file of html5boilerplate):
http://www.stevesouders.com/blog/2008/08/23/revving-filenames-dont-use-querystring/

Change History (15)

#1 @SergeyBiryukov
11 years ago

  • Component changed from General to Script Loader
  • Description modified (diff)

#2 @ocean90
11 years ago

  • Summary changed from Ehancement: file versioning should not use query strings, but rename the filename to allow caching to File versioning should not use query strings, but rename the filename to allow caching

Sounds like plugin material for me, as it requires a change to the .htaccess/nginx config.

#3 @fsvm88
11 years ago

Hello,

I am a systems administrator and programmer.
I spent the last three days helping my girlfriend (web designer) working out issues with CSS and JS file loading.

In my opinion a CMS is not a cache, proxy or VCS, that's why I think *versioning* static resources *is conceptually wrong*. It makes effective cache use and configuration much more troublesome, which causes more complexity, and thus (in general), problems.

Another common approach to solve the "make the new file available to users" is by cachebusting. This is done by generating some sort of fingerprint to identify the file, and publishing the file with the fingerprint in its name (not the extension).
Usually, a hash or a timestamp is used. An hash is preferable because it is based on content rather than filesystem metadata like a timestamp (which can, at times, be incorrect, since some systems disable access time updates for example).

Rather than requiring an .htaccess/nginx configuration change, it probably would require some sort of server-side caching to be available in order to avoid copying/hashing the files at each page request.
I honestly don't know if WP already has some sort of internal caching (temp directory?), but in that case it could be effectively leveraged, copying the already-named files in place. It would be necessary then, to implement an automatic cache flush whenever a style/script gets updated (either by WP editor or by FTP, for instance).
I also think that such a feature should be allowed to be disabled until it's implemented correctly (which I think it's the shortest road to put users in control).

Anyway, there are already well supported and maintained plugins (like W3 Total Cache), that handle the cache problem very well. It may be worth to let these handle the task instead.

I digressed only because I think the current approach is conceptually wrong. It's not about versioning, but rather about publishing the latest file, which is a different thing. Versioning should be done with a proper VCS.

Thank you for reading through this. :)

#4 follow-up: @dd32
11 years ago

  • Milestone Awaiting Review deleted
  • Resolution set to wontfix
  • Status changed from new to closed

Unfortunately this isn't something we can really implement.

  • It's up to Proxy admins to have correct caching rules, Squid has recommended caching "dynamic" url's since 2.7
  • A proxy config which deny's caching url's with ? will simply fall back to browser-based caching
  • WordPress can't really do this without adding extra rewrite rules
  • WordPress core doesn't load CSS/JS files directly, rather they go through a dynamic php script loader, so even if we did add rewrite rules and all that, it's still not going to be viable as we'd have a dynamic url in 99% of cases
  • Plugins/Themes can already do this if they pass the correct arguments to wp_register_script(), and rename the file with each release
  • The web has grown up greatly since 2008, unless currently core was significantly broke by overzealous caching (which it isn't) then I think we should rely on modern caching software to do things "right"

I'm marking this as wontfix for now, you can still comment if you feel strongly about the issue.

#5 @swissspidy
9 years ago

#36830 was marked as a duplicate.

#6 in reply to: ↑ 4 ; follow-up: @drzraf
9 years ago

  • Focuses performance added
  • Resolution wontfix deleted
  • Status changed from closed to reopened

Replying to dd32:

Could you please justify why query-string where added to assets in the first place?
What issue were they supposed to solve? (browser software? browsing context? ...)

Refreshing downstream proxies is as easy as issuing a "Cache-Control" or "Pragma", and the webserver as well as the client:
If a resource changes, just trigger

wp_remote_get($post_url, [ 'blocking'  => false 'headers'  => array('Cache-Control' => 'no-cache') ]);

But these query-string all over the place just make caching more difficult.
Also note that all assets URL are not issued using WP hookable API, even in wp-includes/ wp-admin there are currently more that 40 occurences matching

\?(v|ver|rev)=

(Quite an intrusive cache workaround)
Given how much this (bad practise?) expands, it would be nice that strong argument could be given to justify it, or otherwise consider going back the plain simple URL and solve hypothetical cache freshness problems using best-practises as of 2016.

thank you

#7 follow-up: @dd32
9 years ago

  • Resolution set to wontfix
  • Status changed from reopened to closed

The versioning is used for cache-busting in events where browsers do not recheck the server version on a regular basis (for example, because it was served with a 30 day expiry).
A filename change would also solve that problem, but comes with it's own issues - for example, either rewrite rule requirements, or changing the filename each release which has huge implications for reliable updates on the thousands of weird and underpowered servers WordPress is used on.

So this isn't going to be changed in core, if you wish to use filename-based cache-busting techniques, that's possible through the usage of filters such as script_loader_src and style_loader_src, but it's not currently feasible for WordPress to change and operate in all environments.

Discussion can continue while the ticket is closed; re-opening isn't going to make it be reconsidered, only a good argument for change (of which, none have been made yet) will be considered.

#8 in reply to: ↑ 7 @drzraf
9 years ago

A filename change would also solve that problem, but comes with it's own issues

I'm in favor of removing completely these suffixing strategies.

Discussion can continue while the ticket is closed

good to know

The versioning is used for cache-busting in events where browsers do not recheck the server version on a regular basis (for example, because it was served with a 30 day expiry).

  • Which browsers are affected?
  • What/Who may configure *by mistake* WP to set a too-large expire time?
  • ... and why would WP need to workaround these cases?
  • When a misconfiguration makes/made cache-busting mandatory, was it a bug from core, from a plugin, a specific situation (WP wp-admin/ files upgrade?), or from a misconfigured intermediary proxy? In the later case, is it still a software version in use nowadays?

As an hypothetical example about supporting buggy softwares/misconfigurations, in the event Firefox would not send correctly If-Modified-Since, would it WP responsibility to use a standard-borderline URL-scheme as workaround, at the risk of affecting every other vendor?

#9 follow-up: @dd32
9 years ago

Which browsers are affected?

Any browser which respects a server-sent expiry, noting that WordPress doesn't send those, but rather the sever configurations.

What/Who may configure *by mistake* WP to set a too-large expire time?

WordPress developers and users are not the ones who configure the server in most cases. Many WordPress sites get installed behind a CDN (the user may not even realise that, or might not understand the implications of it), etc.

It's also not uncommon (unfortunately) to find a declaration such as ExpiresByType text/css "access plus 30 days" on some preconfigured servers, which the end-user can't modify.

WordPress is rarely run in a scenario where the configuration of all the moving parts is perfect, that's why WordPress "just works" almost anywhere - it's amazing for an end-user, yet also introduces extra restrictions upon what we (the WordPress project) can reasonably do.

#10 in reply to: ↑ 9 @drzraf
9 years ago

Replying to dd32:

Which browsers are affected?

Any browser which respects a server-sent expiry, noting that WordPress doesn't send those, but rather the sever configurations.

Ok, so the browsers are assumed to respect RFC-2616 and to behave correctly to servers-originated headers.
They are not part of the issue then.

What/Who may configure *by mistake* WP to set a too-large expire time?

WordPress developers and users are not the ones who configure the server in most cases. Many WordPress sites get installed behind a CDN (the user may not even realise that, or might not understand the implications of it), etc.

AFAIK a CDN do not pose issues by themselves that would make cache-busting necessary.
Users may have limited control over configuration *BUT* any RFC2616-compliant caching-proxies offer a form of control using headers. That's the best/well-defined form of control over caching-proxies and something WordPress can do.

It's also not uncommon (unfortunately) to find a declaration such as ExpiresByType text/css "access plus 30 days" on some preconfigured servers, which the end-user can't modify.

How is it a problem? Isn't intended?

These are CSS (from core, theme, plugin, or uploaded by the user in some way). Server configuration says there expire in 1 month and the browser cache them as it should. Any attempt to circumvent this is just trying to discard RFC2616. Why not trust sysadmin by default.

How is it a problem (bis)? Isn't configurable after all?

Citing Apache2 documentation:

When the Expires header is already part of the response generated by the server, for example when generated by a CGI script or proxied from an origin server, this module does not change or add an Expires or Cache-Control header.

In such a configuration assets maybe routed to index.php (rather than having RewriteRule -f bypassing the index.php entry-point) so that PHP can force a short Expire time.
(letting workarounds to buggies configurations)

Cache-busting does not give a (strong) guarantee to solve the issue.

In case the end-user has no access to webserver configuration, he does not have access either to the fact that foo.css?ver=4.2 could be given a text/css mime-type and thus be cached as well. Multiple resources will be cached for 1 month rather than resource being refreshed when needed.
Freshness guarantee is strong for Apache2 webserver since it won't cache these entities. But is the the current cache-busting workaround Apache2-centric? (at least it seems like)

Cache-busting is a workaround rather than the correct cache-refreshing method

Mostly this is saying:

  • css may needs refresh now
  • wait user to hit index.php again
  • put some css-suffixing to css links in the HTML
  • expect browser to send a GET for these URL
  • expect the server to invalidate it's cache since a new query-string was appended (or, more exactly, cache the new version under a new key/URI)

This is a very long, painful and indirect workaround for invalidating caches while there are a so simpler solution:

  • fire a GET with Cache-Control: no-cache, or Pragma: no-cache (or both if you want) to said resource and forget (you can be sure the proxy will kindly refresh its cache)

This can be done by the browser (hard-refresh) or by the "end-user" (the WordPress instance/customer) using file_get_contents() & co and this is... standart.

How many users the current cache-busting method accounts for?

As a complement of my above question of "how is it a problem that a css would be cached 1 month?", I would curious about what's the percentage of users affected, I mean users:

  • behind a caching-proxy? --- 20%?
  • that they don't control? ----- 50% of the above?
  • and is misconfigured? ------ 10% of the above?
  • and misconfigured in a way it cause problems to WP? X% (I don't know what kind of problem)

and I would even add:

  • where the proxy cache can't be invalidated using standard headers (here we hit 0%)

Sure you may have better statistics available, but I bet we are talking about < 0.2% of the users, while the hypothetical 20% of the total user-base are given a non-functional cache from the default installation.
And if we even consider only Apache2 setups, I think that more 50% of the people have an "AllowOverride" allowing users to play with mod_expire in their htaccess.

Or should we assume there are more misconfigured caching-proxies than well-configured caching-proxies?

In the end, if cache-busting is necessary (when ?) to avoid problems (of what kind?), why not prefer a standard cache-busting method (called "cache invalidation") rather than a dirty and invasive custom one which brings more than one side-effects?

WordPress is rarely run in a scenario where the configuration of all the moving parts is perfect, that's why WordPress "just works" almost anywhere - it's amazing for an end-user, yet also introduces extra restrictions upon what we (the WordPress project) can reasonably do.

Yes but having assets suffixed with a query-string in core is a way to assume that its serves more people than people it loosen. Let's investigate whether it's true or not.

  • We can enumerate buggy setups and see how real and/or mainstream they are.
  • Then we can consider whether cache-busting URL-suffixing could/should be done as a plugin
  • and/or discuss alternatives
  • ...

But I can't argue about whether WordPress default configuration should match one given buggy cache setup theoretically because it goes down to politics.

#11 @drzraf
9 years ago

Adding @mdawaffe here since, according to git history, he commited to original implementation of wp_enqueue() accepting a $ver parameter back in 2006.

Commit: 4d49e98fe455ead48a0fac70ae32101d66056809

@param string ver (optional) Script version (used for cache busting)

#12 in reply to: ↑ 6 ; follow-up: @mdawaffe
9 years ago

Replying to drzraf:

I'm not sure what problem you're trying to solve.

Refreshing downstream proxies is as easy as issuing a "Cache-Control" or "Pragma", and the webserver as well as the client:
If a resource changes, just trigger

wp_remote_get($post_url, [ 'blocking'  => false 'headers'  => array('Cache-Control' => 'no-cache') ]);

...

simpler solution:

  • fire a GET with Cache-Control: no-cache, or Pragma: no-cache (or both if you want) to said resource and forget (you can be sure the proxy will kindly refresh its cache)

How does this help? There's no way to know what proxies there are between some arbitrary visitor and your site. In any case, browsers will still have the old version cached.

To restate what's been mentioned above, a pretty common setup is for the webserver to issue long expiry times (let's say 1 year). In this setup:

  • Browser makes request to style.css?ver=1
  • Browser (and potentially proxies in between) cache style.css?ver=1
  • When the underlying style.css resource changes, the version is bumped (by core in core's case or by the plugin/theme in that case).
  • Browser makes new request to style.css?ver=2
  • Browser (and potentially proxies in between) cache style.css?ver=2

There's no way for the server to clear unknown proxy caches. There's no way for the server to clear the browser cache. Instead, the server tells the browser to request a new resource (by altering this query string parameter).

Plugins can, of course, change the core behavior. For example, I know some sites remove the ?ver= parameter and replace it with a query string parameter that corresponds to the mtime of the file. This mtime method is more robust but is hard to implement correctly on sites served by many webservers. It also may be non-performant on many hosts. Core's ?ver= method is good compromise that works pretty well most places.

What/Who may configure *by mistake* WP to set a too-large expire time?

I don't think this long cache expiry choice is a mistake.

#13 in reply to: ↑ 12 @drzraf
9 years ago

I'm not sure what problem you're trying to solve.

  • remove a workaround
  • simplify my software stack
  • remove corner cases/bugs

Query-string appended to static files may causes (non-exhaustive list):

  • mime-type to be messed-up
  • impossibility to cache resource
  • multiple (too many) cached resource
  • log files/stats files inconsistencies (same file, multiple URL)
  • ...


Refreshing downstream proxies is as easy as issuing a "Cache-Control" or "Pragma", and the webserver as well as the client:
If a resource changes, just trigger

wp_remote_get($post_url, [ 'blocking'  => false 'headers'  => array('Cache-Control' => 'no-cache') ]);
  • fire a GET with Cache-Control: no-cache, or Pragma: no-cache (or both if you want) to said resource and forget (you can be sure the proxy will kindly refresh its cache)

How does this help? There's no way to know what proxies there are between some arbitrary visitor and your site. In any case, browsers will still have the old version cached.

First, please note that changing resource location is not a cache invalidation.
But you're right, and I was wrong in the above post: this will not refresh downstream proxies.
It will refresh server-side proxies = any proxy between public-address and internal webserver IP (where most reverse-proxies lie).
Although for clarity it could have been written:
Cache-Control: max-age=0, must-revalidate

Indeed you're mostly right here, it does not force refresh of down-stream proxies or user-agent caches.

This is something the user alone (= the HTML webpage) can do, we may want to split both discussions (downstream proxies / reverse-proxies) to avoid confusion.

About user-agent cache refresh, there are alternatives (worth noting we are working around buggy server-side imposed Expire header):
For example window.reload(True)
which is likely the same as sending XHR + Cache-Control: max-age=0 for all enqueued assets.
As a HTML-inlined javascript it will refresh UA cache and intermediary proxies.

This is better than query-string but ask the very interesting question:
When is it right to triggering the cache-refresh routine (= how does the WordPress application, when asked to generated HTML, knows whether the user-agent uses an old version of a static file or not)?

To restate what's been mentioned above, a pretty common setup is for the webserver to issue long expiry times (let's say 1 year).

That's the moot-point, and what needs a better definition before going effectively forward:

  • how common is it?
  • does it represents the majority of cache-enabled webservers?
  • which OS, distributions, hosting services are known to distribute such a setup?

It must be added that we do have control over assets (including caching options) if we want to, it's just a matters of RewriteRules and Cache-Control headers:

If a response includes both an Expires header and a max-age directive, the max-age directive overrides the Expires header.

If buggy webservers are not common enough or if there do not represent the majority of cache-enabled webservers it would fair to assume the workaround (and performance cost it implies) must impact them rather than well-behaving proxies. (as a bonus it gives incentive for implementing cache right)

There's no way for the server to clear unknown proxy caches.

That right *but* they are cached because WordPress asked it in the first place, and WordPress decides about this using headers.
Of course in some setup the webserver may try to bypass the application, but using RewriteRules it's easy to regain full-control.

There's no way for the server to clear the browser cache. Instead, the server tells the browser to request a new resource (by altering this query string parameter).

That's somehow the point of caching and a point for not playing with Cache-* or Expires HTTP headers.

Plugins can, of course, change the core behavior. For example, I know some sites remove the ?ver= parameter and replace it with a query string parameter that corresponds to the mtime of the file. This mtime method is more robust but is hard to implement correctly on sites served by many webservers. It also may be non-performant on many hosts. Core's ?ver= method is good compromise that works pretty well most places.

Are Expires +1 years so frequents that they need this compromise into WP core?
It would be easy for a plugin to introduce one of the various workarounds for their specific problem?
I bet these ?ver= are not needed for 100% of non-cached WP instances, and not needed for 80% of cached WP instances because their (Apache?) webserver is configured correctly.

What/Who may configure *by mistake* WP to set a too-large expire time?

I don't think this long cache expiry choice is a mistake.

See this:

To mark a response as "never expires," an origin server sends an Expires date approximately one year from the time the response is sent. HTTP/1.1 servers SHOULD NOT send Expires dates more than one year in the future.

This I'm pretty sure +1y asset caching is mostly a mistake but it's bound to "Unique Resource Locator" definition/interpretation and related RFCs. RFC2616 terms do not imply a widespread use of such a caching policy.

Paraphrasing this, it's saying to the UA Assume that the webpage will point you to the newer resource.
(people caching WP front-page some minutes or some hours breaks the assumption that HTML page is the (only) way to refresh the assets)

It's all about website visitor patterns and website assets upgrade transitions (and also about whether HTML output itself is cached or not).

The query-string method is a way to keep webpage and assets in sync' in a cache-enabled context and thus avoid this kind of questions:

  • do we accept old CSS for an NEW page?
  • do we accept new CSS for an OLD page?

In one hand it implies that jquery.js?ver=1.2.3 will be universally OK, but on the other one non-suffixed version jquery.js will be inconsistent (according to my place in the network I would be given a different resource).

The logical implication of a +1y Expires for WP would to explicitly put versioning inside the filenames, ex: jquery-1.2.3.js rather than using query-string.
But please leave that to "long-expires" webservers (or those, like Yahoo! ones who are ready to deal with the side-effects it induces)

#14 @szepe.viktor
8 years ago

I came across this problem.
E.g. Amazon CloudFront likes paramterless URL-s.

So this is my solution: https://wordpress.org/plugins/resource-versioning/

The reversing Apache rule is

RewriteRule ^(.+)\.\d\d+\.(js|css|png|jpg|jpeg|gif|ico)$ $1.$2 [NC,L]
Last edited 8 years ago by szepe.viktor (previous) (diff)

#15 @drzraf
8 years ago

For reference:

https://github.com/w3c/preload/issues/68

Add support for cache busting via etag header

https://bz.apache.org/bugzilla/show_bug.cgi?id=21935

Apache ability to cache query-string-striped (mod_rewrite) URL:

But before such a feature lands (and this could take while if ever), I'd still hope that WP core stop doing cache-busting at all. Could dev' reopen and reconsider?

Note: See TracTickets for help on using tickets.