Make WordPress Core

Opened 10 years ago

Closed 8 years ago

Last modified 8 years ago

#28722 closed feature request (fixed)

Boost performance with ETag in load-scripts.php and load-styles.php

Reported by: sergejmueller's profile sergej.mueller Owned by: swissspidy's profile swissspidy
Milestone: 4.5 Priority: normal
Severity: normal Version: 4.0
Component: Script Loader Keywords: has-patch commit
Focuses: administration, performance Cc:

Description

Attach ETag header to the file output and check the HTTP_IF_NONE_MATCH request header before echo.
The generated ETag based on the MD5 checksum of the combined file content.

Network Performance Comparison
Without ETag (after page reload): https://db.tt/TSWkm7Qj
With ETag (after page reload): https://db.tt/YB7CLesw

Saving: ca. 150 KB incoming data

Note: wp_unslash() is not available there. filter_input works fine.

Attachments (8)

trunk.patch (1.3 KB) - added by sergej.mueller 10 years ago.
wp-admin.patch (1.4 KB) - added by sergej.mueller 10 years ago.
stripslashes() instead of filter_input()
28722.diff (4.7 KB) - added by dd32 10 years ago.
28722.2.diff (4.1 KB) - added by dd32 10 years ago.
28722.3.diff (1.7 KB) - added by swissspidy 8 years ago.
28722.4.diff (1.7 KB) - added by swissspidy 8 years ago.
28722.5.diff (2.0 KB) - added by swissspidy 8 years ago.
28722.6.diff (2.0 KB) - added by swissspidy 8 years ago.

Download all attachments as: .zip

Change History (38)

@sergej.mueller
10 years ago

stripslashes() instead of filter_input()

#1 @sergej.mueller
10 years ago

New patch file with stripslashes() instead of filter_input() for huge hoster support, because filter_input(INPUT_SERVER, 'HTTP_IF_NONE_MATCH', FILTER_SANITIZE_STRING) was temporarily empty.

#2 @sergej.mueller
10 years ago

How to test it

  1. Open Developer Tools in your browser. Switch to Network tab.
  2. Reload a admin page (e.g. dashboard) and monitor the requested files in dev tools.
  3. Look for load-scripts.php and load-styles.php files.

@dd32
10 years ago

@dd32
10 years ago

#3 @dd32
10 years ago

  • Keywords has-patch added

The proposed patch looks like a good start to me, however, have you considered the impacts of using the available $wp_version and script versions instead of a md5 of the content?

Although we set a 1 year expiry on the document, browsers will still check it, as you can see.
We cache bust the document based on $wp_version so the suggestion of using the individual script versions is probably over-the-top..

I only ask about the alternatives, as in many environments the disk IO of reading the files and then MD5'ing it may be significant, and maybe useless given the above caching we do.

28722.diff is an implementation that avoids disk IO by basing the etag on the the $wp_version + script handle versions. 28722.2.diff is a implementation that simply uses our cache buster ($wp_version) as the etag (etags don't need to be an md5, although that's the most common implementation)

I did a quick apachebench (1000 req's, 1 thread, between two local VM's) against 4.0-beta1 with a large script loader request, and came up with some interesting results:

Patchwith eTagtime per request(ms)req/sreq size(bytes)
No PatchNO19.292ms51.83169,411
wp-admin.patchNO19.96250.10169,411
wp-admin.patchYES19.24051.970
28722.diffNO19.67050.84169,411
28722.diffYES6.919144.540
28722.2.diffNO19.38051.60169,411
28722.2.diffYES6.818146.680

URL requested: wp-admin/load-scripts.php?c=1&load[]=hoverIntent,common,admin-bar,wp-ajax-response,jquery-color,wp-lists,quicktags,jquery-query,admin-comments,jquery-ui-core,jquery-&load[]=ui-widget,jquery-ui-mouse,jquery-ui-sortable,postbox,dashboard,customize-base,customize-loader,thickbox,plugin-install,underscor&load[]=e,shortcode,media-upload,svg-painter,heartbeat,wp-auth-check,word-count,wplink&ver=4.0-beta1

The ApacheBench testing shows, that at least when on a local network, the network IO is minimal, but the time spent doing file operations is significant.

Just for fun, I ran it against a real server, as the above synthetic tests don't really show much (other than PHP executed faster)

Patchwith eTagtime per request(ms)req/sreq size(bytes)
28722.2.diffNO886.7811.13169,028
28722.2.diffYES162.4786.150
wp-admin.patchYES164.2236.090

I had to reduce this to 100 requests (still 1 thread), but as expected, DiskIO/CPU/NetworkIO were as I’d expect, pretty high for the no-etag test, disk/CPU spiking with wp-admin.patch but network stayed low, and hardly anything showing up for 28722.2.diff

The only thing I get out of this is, adding an Etag seems like a good idea, it's just a question of what's the best data for us to use for it.

#4 @nacin
10 years ago

  • Milestone changed from Awaiting Review to Future Release

28722.2.diff looks interesting. This looks like it could be a nightmare for testing, though. We'd need to make sure that nightly builds force a new $wp_version. (Testing develop.svn /build would still be painful, though.)

I agree with avoiding IO where possible.

#5 follow-up: @dd32
10 years ago

This looks like it could be a nightmare for testing, though. We'd need to make sure that nightly builds force a new $wp_version. (Testing develop.svn /build would still be painful, though.)

IMHO we'd have to disable it for anything that's not a stable release, and as for testing it if it's only active for stable, well, we'd just use the version param in that case (I bet there's a security plugin that removes that though?)

#6 @chriscct7
8 years ago

  • Keywords needs-refresh added

@swissspidy
8 years ago

#7 in reply to: ↑ 5 @swissspidy
8 years ago

  • Milestone changed from Future Release to 4.5

Replying to dd32:

This looks like it could be a nightmare for testing, though. We'd need to make sure that nightly builds force a new $wp_version. (Testing develop.svn /build would still be painful, though.)

IMHO we'd have to disable it for anything that's not a stable release, and as for testing it if it's only active for stable, well, we'd just use the version param in that case (I bet there's a security plugin that removes that though?)

28722.3.diff is a refreshed patch that applies cleanly against trunk.

It doesn't exit() when $wp_version contains a hyphen and therefore is considered to not be a stable release.

Previous testing already showed that simply using $wp_version for the Etag is the fastest solution.

#8 @dd32
8 years ago

28722.3.diff looks good to me, although some mechanism for testing this in trunk before release would be greatly beneficial.. having code that does nothing in trunk, and suddenly kicks in post-release could be a disaster :)

(I can't see how this could blow up in our faces though)

Last edited 8 years ago by dd32 (previous) (diff)

#9 follow-up: @ocean90
8 years ago

HTTP/1.1 should probably be replaced with the value of $_SERVER['SERVER_PROTOCOL'];. Related: #34131

@swissspidy
8 years ago

#10 in reply to: ↑ 9 ; follow-up: @swissspidy
8 years ago

  • Keywords needs-refresh removed

Replying to ocean90:

HTTP/1.1 should probably be replaced with the value of $_SERVER['SERVER_PROTOCOL'];. Related: #34131

28722.4.diff now uses status_header( 304 ).

Replying to dd32:

28722.3.diff looks good to me, although some mechanism for testing this in trunk before release would be greatly beneficial.. having code that does nothing in trunk, and suddenly kicks in post-release could be a disaster :)

(I can't see how this could blow up in our faces though)

I don't currently have an idea how to do this (without updating $wp_version in nightly builds). Maybe already trigger it for Betas and RCs?

#11 in reply to: ↑ 10 ; follow-up: @ocean90
8 years ago

Replying to swissspidy:

28722.4.diff now uses status_header( 304 ).

That will produce a fatal error because the function isn't available there.

#12 in reply to: ↑ 11 @swissspidy
8 years ago

Replying to ocean90:

Replying to swissspidy:

28722.4.diff now uses status_header( 304 ).

That will produce a fatal error because the function isn't available there.

Oops, you're right. My bad.

Well then something along these lines should do the trick:

$protocol = $_SERVER['SERVER_PROTOCOL'];
if ( ! in_array( $protocol, array( 'HTTP/1.1', 'HTTP/2', 'HTTP/2.0' ) ) ) {
        $protocol = 'HTTP/1.0';
}
header( "$protocol 304 Not Modified" ); 

#13 @swissspidy
8 years ago

28722.5.diff is an updated patch based on my previous comment.

So, how could we test this addition in trunk?

@swissspidy
8 years ago

#14 @dd32
8 years ago

I think we should drop the false === strpos( $wp_version, '-' ) test afterall.

  1. If you're running in /src/ then load-*.php isn't used anyway ($wp_version includes -src)
  2. If you're running out of /build/ then load-*.php is used, however, $wp_version = '4.5-alpha-$DATE' (so unique per day)
  3. If you're running from core.svn.wordpress.org then load-*.php is used, however $wp_version = '4.5-alpha-$LAST_COMMIT' (So unique per every commit)

Of course SCRIPT_DEBUG overrides #2 and #3.

We could make #2 scenario more reliable for testing by using more than just Ymd as $DATE, we could stick in Ymd.his even.

I'm tempted to drop this in now so that users in scenario #3 above can test.

@swissspidy
8 years ago

#15 @swissspidy
8 years ago

28722.6.diff removes the false === strpos( $wp_version, '-' ) check as suggested.

#16 @swissspidy
8 years ago

  • Keywords commit added

Let's do this :-)

#17 @swissspidy
8 years ago

  • Owner set to swissspidy
  • Resolution set to fixed
  • Status changed from new to closed

In 36312:

Script Loader: Add Etag: $wp_version header in load-scripts.php and load-styles.php.

This improves performance since browsers won't re-download the scripts and styles when there was no change in $wp_version.

Props sergej.mueller, dd32, swissspidy.
Fixes #28722.

#18 @ocean90
8 years ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

We could make #2 scenario more reliable for testing by using more than just Ymd as $DATE, we could stick in Ymd.his even.

+1

#19 follow-up: @swissspidy
8 years ago

@ocean90 I thought that would be part of the build process or something like that. Where can I change that?

#20 in reply to: ↑ 19 @ocean90
8 years ago

Replying to swissspidy:

@ocean90 I thought that would be part of the build process or something like that. Where can I change that?

https://core.trac.wordpress.org/browser/trunk/Gruntfile.js#L112

#21 @swissspidy
8 years ago

  • Resolution set to fixed
  • Status changed from reopened to closed

In 36315:

Build Tools: Append the timestamp to $wp_version instead of only the current date.

This ensures the Etag header added in [36312] changes for every build.

Fixes #28722.

#22 @dd32
8 years ago

Build Tools: Append the timestamp to $wp_version instead of only the current date.

I don't think using a unix timestamp is the best idea here, it should be a human-readable timestamp.

I'd suggest it be grunt.template.today( 'yyyymmdd.HHMMss' ) instead. The extra dot there makes it easier to split up and read, 4.5-alpha-20160118.140058, could also use another - instead.

#23 @swissspidy
8 years ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

I considered grunt.template.today( 'yyyymmdd.HHMMss' ) first but thought that we could just as well use a timestamp.

Both methods work for me, but I'll happily change it to a more human-readable date format.

#24 @dd32
8 years ago

  • Resolution set to fixed
  • Status changed from reopened to closed

In 36351:

Build Tools: Revert to using a human readable timestamp rather than the unix timestamp in [36315].
Fixes #28722

#25 @RedSand
8 years ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

The WordPress version should not be used in headers like this, as it's a security risk. Revealing software version in headers or code is not a good security practice in general.

The IETF (Internet Engineering Task Force) has this to say in RFC 2068:

"Revealing the specific software version of the server may allow the server machine to become more vulnerable to attacks against software that is known to contain security holes."

If a security vulnerability is discovered, and a site owner hasn't upgraded their site yet, revealing this makes it easy for hackers to run automated scripts to scan their site and discover the version number. That's why most security hardening plugins remove the WordPress version number from the site's code.

Obviously, a website owner should practice good security, but even so, this should be changed so that WordPress code leaks as little data as possible.

Keep in mind that every version of WordPress released in the last couple years has had security vulnerabilities discovered after a while, so it's safe to assume that a vulnerability will be discovered in 4.5 sooner or later. If you look at the stats of what WordPress versions people are running compared against a list of vulnerable WordPress versions, you can see that only about 52% of users are running 4.4 or higher, and a good portion of the rest are using vulnerable versions. (Not everyone has upgraded to the security patched minor version in their branch.) If you look ahead 6 months or a year, there may still be users who haven't upgraded from 4.5 who will be in that same situation.

Last edited 8 years ago by RedSand (previous) (diff)

#26 follow-up: @Presskopp
8 years ago

@RedSand:
Isn't it so that it's not so hard to find out a site is using WordPress, whatever you try to hide? Security doesn't come from hidden version number.. :)
And the bots must not even test for WP running or check version, they just fire in all directions, and if they find an open door - bingo, if not, they probably don't try to find out why, but just switch to the next possible target.
This doesn't mean to don't care about protection and security! :)

#27 in reply to: ↑ 26 @RedSand
8 years ago

Replying to Presskopp:

@RedSand:
Isn't it so that it's not so hard to find out a site is using WordPress, whatever you try to hide? Security doesn't come from hidden version number.. :)
And the bots must not even test for WP running or check version, they just fire in all directions, and if they find an open door - bingo, if not, they probably don't try to find out why, but just switch to the next possible target.
This doesn't mean to don't care about protection and security! :)

Hey @Presskopp

Yes, finding out a site is using WordPress is easy. The directory structure (among many other things) would give that away, so revealing it's WordPress is not the concern. Keep in mind, I said it's the version numbers that shouldn't be advertised.

That's actually not how most hacks happen these days. With the sheer number of exploits out in the wild now, it's not really practical to just "fire in all directions anymore". That would be an easy way to be detected, and not to mention it takes a lot of bandwidth for the hacker. Sure, there are still bots employed by script kiddies that just try to hit a site with a variety of common exploits. More often though, hackers set bots to not try any attacks on their first pass...they are merely scanning sites collecting data for a targeted attack in the next step. They are building a list of sites that have specific vulnerable software versions. They can either be hit again automatically right after the data is collected. Or, the hacker employing the bot, can review the list manually and single out interesting targets. Either way, the bots are sent again, this time going for specific vulnerabilities and that's when they attack. This has a much higher success rate, and is harder to catch.

I hear you about hiding the version numbers not being a mighty security measure in and of itself, but notice I didn't say that security came from hiding the version number. It's one small layer in the whole strategy, though, and stats show that hiding the version number throughout the site will reduce the number of sites that get hacked.

For example, if I can do a quick scan of your site, and see that you're using a specific version of WordPress that's known the have x vulnerabilities, it makes it that much easier for me to target your site. That doesn't mean you shouldn't practice good security though...you should by all means, and be using updated versions.

But the fact that 48% of WordPress users aren't even updated to the 4.4 branch and 31% are still using PHP 5.2 and 5.3, shows pretty clearly that a high percentage of users are not practicing good security. So, we need to help them out and make it as secure as possible. Not everyone has expert security consultants helping them.

We do security consulting day in and day out for clients, and most hacks come from outdated software, and revealing version numbers makes it easier for hackers to gather data and target. Most hacks also don't come from really talented hackers...they come from hackers with low to intermediate skill. Yet they cause a lot of damage.

Security isn't binary...it's not an on or off type of thing. Security is about reducing risk, and lowering the statistical probability of a successful attack. You can never eliminate risk fully, and there is no such thing as 100% impenetrable security, even with the best measures in place.

In most criminal acts, it’s about following the path of least resistance — if you increase the difficulty of success (sometimes by even a small margin) then often the hacker will go somewhere else.

Good security requires a layered strategy. It's really not necessary for anyone other than the site owner to know the version number, so why put that out there? (Similarly, WordPress needs to remove all references to the version number throughout the site, so I will put in tickets regarding those as well.)

This is a pretty well-established security principle (granted - only one among many) to avoid web server fingerprinting, and to avoid revealing specific version numbers, so I'm not sure why we would want to ignore any good security principles, however small they may seem.

So while this may not seem like a big thing in and of itself...look at it like this: If we make some small changes like this throughout WordPress core - not revealing version numbers - and it can help prevent some of WordPress' users from being hacked, why wouldn't we want to do it? :)

It would be easy to use a kind of salted hash or other random stored key for the Etags instead, and not use the WordPress version number. It would be easy enough to have it change when the WordPress version gets updated, so the same goal would be accomplished. Etags aren't supposed to contain identifiable information anyway...they are supposed to be a static random value that can be used for comparison to tell when something changes.

Last edited 8 years ago by RedSand (previous) (diff)

#28 @Presskopp
8 years ago

I follow you. You opened the door to discuss security aspects in a greater range - let's do so :)

I don't want to spam here, but what about the general use of Etags?
https://developer.yahoo.com/blogs/ydn/high-performance-sites-rule-13-configure-etags-7211.html

Last edited 8 years ago by Presskopp (previous) (diff)

#29 follow-up: @jeremyfelt
8 years ago

  • Resolution set to fixed
  • Status changed from reopened to closed

Hi @RedSand, thanks for taking the time to share your thoughts.

We've been using WordPress version numbers on scripts and styles for quite a while to break client side caching after an upgrade. If we didn't use WordPress version numbers, we'd have to use another number that could be automatically scanned. If we didn't use that, then the size or content of the script itself would be used for automatic scanning.

A benefit of having cache breaking version numbers attached to scripts is that it helps guarantee immediate upgrades if there ever is a security issue. It's also an easy way for us to scold nudge people we know that are running old versions of WordPress. :)

I'm going to re-close this ticket as fixed as it is completed against the 4.5 milestone, which is now in RC2. I'd suggest a new ticket to continue discussing this concern, especially if there are other approaches that can be used to break scripts. Similar discussions are worth searching for as well, see #4155.

#30 in reply to: ↑ 29 @RedSand
8 years ago

Replying to jeremyfelt:

Hi @RedSand, thanks for taking the time to share your thoughts.

We've been using WordPress version numbers on scripts and styles for quite a while to break client side caching after an upgrade. If we didn't use WordPress version numbers, we'd have to use another number that could be automatically scanned. If we didn't use that, then the size or content of the script itself would be used for automatic scanning.

A benefit of having cache breaking version numbers attached to scripts is that it helps guarantee immediate upgrades if there ever is a security issue. It's also an easy way for us to scold nudge people we know that are running old versions of WordPress. :)

I'm going to re-close this ticket as fixed as it is completed against the 4.5 milestone, which is now in RC2. I'd suggest a new ticket to continue discussing this concern, especially if there are other approaches that can be used to break scripts. Similar discussions are worth searching for as well, see #4155.

Hi Jeremy,

You're very welcome. :)

Yes, I'm aware of the usage and the reasons for it, as I've been a WordPress developer and plugin developer for a decade. :)

I'll be happy to open a separate ticket to explore this further for the next releases. Yes, there are a number of ways to break the caching without having to even add query strings. In fact, the use of query strings appended to the scripts actually creates a new performance issue, because caching scripts speeds up the site. But I won't get into that here and now.

My main point here was that it would be better for the Etags to use some other unique string that could be updated when the version is changed, as opposed to the version number.

I will put together some data and solutions for a new ticket.

Version 0, edited 8 years ago by RedSand (next)
Note: See TracTickets for help on using tickets.