WordPress.org

Make WordPress Core

Opened 17 months ago

Closed 13 months ago

Last modified 11 months ago

#22913 closed defect (bug) (fixed)

Files "corrupted" when streamed to file via HTTP API

Reported by: samthorne Owned by:
Milestone: 3.6 Priority: normal
Severity: normal Version: 3.2
Component: HTTP API Keywords: needs-codex
Focuses: Cc:

Description

The plugin updater reports plugins that are in need of an update and will initiate the automatic update of plugins from behind our proxy server (in other words, the connection through the proxy server is working for most of Wordpress).

During auto update however it fails with

An error occurred while updating Postie: The package could not be installed. PCLZIP_ERR_BAD_FORMAT (-10) : Unable to find End of Central Dir Record signature.

debug.log in wp-content contains

[13-Dec-2012 11:41:18 UTC] PHP Notice:  has_cap was called with an argument that is <strong>deprecated</strong> since version 2.0! Usage of user levels by plugins and themes is deprecated. Use roles and capabilities instead. in /Users/sam/Dropbox/Work/wordpress/wp-includes/functions.php on line 2908

If I then perform the same update on a different network connection (which does not need to pass through the proxy server) the update installs correctly with no problems.
In this instance I have tested with Postie, but the same applies to other plugins.

Proxy supports HTTP(S), but not FTP.

Server is running Mac OS X 10.7.5, Apache 2.2.22, PHP 5.3.15

Attachments (2)

akismet.2.5.7.zip (30.5 KB) - added by samthorne 16 months ago.
Corrupted akismet update archive
22913.diff (1.6 KB) - added by dd32 16 months ago.

Download all attachments as: .zip

Change History (20)

comment:1 dd3216 months ago

  • Keywords reporter-feedback added

The debug log you see there is referencing a plugin which needs updating, it's nothing related to the issue at hand

Any special restrictions put in place by the proxy? What proxy software? Is the Proxy defined using WordPress's Constants in wp-config.php, or is it a Transparent network proxy?

The Zip is downloaded over HTTP, so no FTP access is required.

It appears that the ZIP File is being damaged during transit through the Proxy, it could be related to a bad header or something similar..

What output do you get if you place this code in your wp-config file AFTER the wp-settings line, and access the WordPress url like: http://wordpress..../?test=http

if ( !empty( $_GET['test'] ) && 'http' == $_GET['test'] ) {
	$result = wp_remote_get( 'http://downloads.wordpress.org/plugin/hotfix.1.0.zip' );
	$result['body'] = strlen( $result['body'] ) . ': ' . bin2hex( substr( $result['body'], 0, 10 ) );
	unset( $result['cookies'], $result['filename'] );
	echo '<pre>';
	print_r( $result );
}

samthorne16 months ago

Corrupted akismet update archive

comment:2 samthorne16 months ago

I'm trying to determine which kind of proxy software we use, but probably wont find out until after Christmas.
The proxy has a content filter which I believe is keyword and blacklist/whitelist based, though I'm not certain.
It doesn't automatically block zip archives, I can download the hotfix.zip myself and it unzips with no problems.

The proxy server is an authenticated proxy defined in wp-config using:

define('WP_PROXY_HOST', '...');
define('WP_PROXY_PORT', '10080');
define('WP_PROXY_USERNAME', '...');
define('WP_PROXY_PASSWORD', '...');
define('WP_PROXY_BYPASS_HOSTS', 'localhost');

HTTP test returns:

Array
(
    [headers] => Array
        (
            [server] => nginx
            [date] => Wed, 19 Dec 2012 10:59:33 GMT
            [content-type] => application/octet-stream
            [pragma] => no-cache
            [cache-control] => private
            [content-description] => File Transfer
            [content-disposition] => attachment; filename=hotfix.1.0.zip
            [x-nc] => MISS luv 139
            [connection] => close
            [content-encoding] => deflate
            [age] => 0
        )

    [body] => 11507: 504b03040a0000000000
    [response] => Array
        (
            [code] => 200
            [message] => OK
        )

)

I performed the same test with the hotfix file downloaded through my browser not via the proxy and then hosted on localhost:

Array
(
    [headers] => Array
        (
            [date] => Wed, 19 Dec 2012 11:39:28 GMT
            [server] => Apache/2.2.22 (Unix) DAV/2 PHP/5.3.15 with Suhosin-Patch mod_ssl/2.2.22 OpenSSL/0.9.8r
            [last-modified] => Wed, 19 Dec 2012 11:37:33 GMT
            [etag] => "bad04d-2cf3-4d1330a49f940"
            [accept-ranges] => bytes
            [content-length] => 11507
            [connection] => close
            [content-type] => application/zip
        )

    [body] => 11507: 504b03040a0000000000
    [response] => Array
        (
            [code] => 200
            [message] => OK
        )

)

The only difference is the content-type which are both valid anyway.

I then tried adding a copy() call in download_url() in file.php, duplicating the downloaded zip into tmp before anything else happens to it.
This file is corrupted, I have attached it here.

I'm not sure what this means though, how can the proxy corrupt it when WP downloads it but not when the browser downloads it?

comment:3 samthorne16 months ago

  • Keywords reporter-feedback removed

I think I've discovered the issue, the arguments within file.php include enabling streaming of the file.
If streaming is enabled on your http test, then the body returns as 0:

if ( !empty( $_GET['test'] ) && 'http' == $_GET['test'] ) {
	$result = wp_remote_get( 'http://downloads.wordpress.org/plugin/hotfix.1.0.zip', array( 'stream' => true));
	$result['body'] = strlen( $result['body'] ) . ': ' . bin2hex( substr( $result['body'], 0, 10 ) );
	unset( $result['cookies'], $result['filename'] );
	echo '<pre>';
	print_r( $result );
}
Array
(
    [headers] => Array
        (
            [server] => nginx
            [date] => Wed, 19 Dec 2012 14:49:39 GMT
            [content-type] => application/octet-stream
            [pragma] => no-cache
            [cache-control] => private
            [content-description] => File Transfer
            [content-disposition] => attachment; filename=hotfix.1.0.zip
            [x-nc] => MISS luv 139
            [connection] => close
            [content-encoding] => deflate
            [age] => 0
        )

    [body] => 0: 
    [response] => Array
        (
            [code] => 200
            [message] => OK
        )

)

I have curl installed on the server and judging by class-http.php, this would be used to download the file. So something is failing with curl and the proxy server while streaming the file through fopen.

Hope this helps.

comment:4 dd3216 months ago

  • Component changed from Upgrade/Install to HTTP
  • Milestone changed from Awaiting Review to Future Release
  • Summary changed from Plugin Updater fails behind HTTP Proxy to Files "corrupted" when streamed to file via HTTP API
  • Version changed from 3.5 to 3.2

Hope this helps.

It does, it helps a lot!

Unfortunately, it points out a rather large insufficiency in the way we're "streaming" http downloads to file, rather than storing them within memory and then writing them out.

[body] => 11507: 504b03040a0000000000

As you can see by that, when we're not doing streaming to file, the hex header of the data received is 50 4B 03 04. That's the standard Zip file header. The header of the downloaded archive you've provided is 48 0D 0A F0 - Thats a Huffman encoded compressed document, within that compressed document, we'll find the ZIP archive..
So basically, the HTTP API is saving the *raw* received data, rather than the decompressed data.

For most hosts, this isn't a problem, as the WordPress.org servers don't attempt to further compress compressed documents, but for those which go through a proxy, the proxy may legitimately compress the document, resulting in us saving a compressed document to disk.

For example the header:

 [content-encoding] => deflate

isn't added by WordPress.org, rather, it's added by the local proxy server.

Introduced in 3.2 in #16236, Moving to future release pending working out a work around.

comment:5 azaozz16 months ago

I've ran into something similar when doing the script/style concatenation/compression few years ago (double compression).

The "proper" way to deal with proxies would be to try to turn off compression when requesting zipped files:

header('Vary: Accept-Encoding');
header('Accept-Encoding: identity');

Alternatively the Accept-Encoding header can turn off the compression methods specifically:

header('Accept-Encoding: identity;q=1.0, compress;q=0, gzip;q=0, deflate;q=0');

My tests at the time confirmed both working, but it was some 4 years ago, re-testing would be good :)

http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html

Version 0, edited 16 months ago by azaozz (next)

comment:6 dd3216 months ago

Bingo! It's the Accept-Encoding header..

Currently, for ALL requests we send this:

Accept-Encoding: deflate;q=1.0, compress;q=0.5, gzip;q=0.5

We should not send compression encodings when saving to file, and we should add identity to the list of accepted encodings in all cases.

Do we need to specifically disable the compression methods, or just not mention them in the header? I have a feeling the few broken servers/proxies out there will ignore the header anyway..

dd3216 months ago

comment:7 dd3216 months ago

22913.diff is an untested patch, but implements skipping including compression methods in the Accept-Encoding header when saving to file.

This should prevent web servers (& proxy servers) from returning a compressed document.

@samthorne: Any chance of you trying out that patch?

(Also, This patch implements #20142 as well - allowing developers to specify their own Accept-Encoding headers)

comment:8 azaozz16 months ago

Seems like header('Accept-Encoding: identity'); is HTTP 1.0 and q=0, ...q=1.0 were added in HTTP 1.1. Thinking even the most broken proxy should respect HTTP 1.0, unfortunately if it doesn't support 1.1, it may ignore a header with q=0 completely.

Yeah, identity is the default value, should be the same as sending an empty header. Haven't seen it specified anywhere though, perhaps worth it digging up some clients that don't accept compressed content and seeing what they use.

comment:9 dd3216 months ago

perhaps worth it digging up some clients that don't accept compressed content and seeing what they use.

Good luck there..
I'd rather be explicit rather than relying upon default behaviours, so by specifying the header with only identity, we're saying "We just want this, and we don't understand the rest".. specifying them as q=0 seems like it'd work around a buggy server here or there, but may be understood incorrectly by a HTTP/1.0 server.

Worth noting, that WP_HTTP claims to be a HTTP/1.0 client, so if it's sending 1.1 headers, we should change the version to 1.1.
I'd also be fine with dropping the "Quality" param completely from this lineup so it's HTTP/1.0 compatible, but that may mean HTTP/1.1 servers (all of them..) would choose compress or gzip over deflate, and face it, 'deflate' is the ONLY code branch that has been tested with WP_HTTP, since every server supports it, the other branches are rarely hit.

comment:10 rmccue16 months ago

From my work on Requests, I've found that the best way to opt for no encoding is to simply not include the header. Setting Accept-Encoding: identity can confuse some servers, and q=0 is simply ignored by some servers. Admittedly, these are edge cases, but it's worth considering.

Worth noting, that WP_HTTP claims to be a HTTP/1.0 client, so if it's sending 1.1 headers, we should change the version to 1.1.

I believe it was set to this originally to avoid getting chunked responses before the dechunking was implemented. I'm not 100% certain on this though.

comment:11 dd3216 months ago

I've found that the best way to opt for no encoding is to simply not include the header. Setting Accept-Encoding: identity can confuse some servers, and q=0 is simply ignored by some servers.

The edge case I was primarily worried about were servers that go "Oh, No Accept-Encoding? You must support industry standard deflate then surely?" - I hope this edge case is less common than servers saying "Oh, you support identity? Ok, here's deflated content then!".

I don't think we'll be able to 100% support all edge cases without reimplementing a lot of things.. I'd be happy to go with not sending the header, that way, if a server assumes something, it's assumed wrong, it can't be argued it was just confused by a header it didn't understand :)

comment:12 dd3214 months ago

  • Milestone changed from Future Release to 3.6

Lets get this fixed in 3.6

comment:13 dd3214 months ago

In 23601:

WP_HTTP: Do not send a Accept-Encoding header when we're streaming to file, or decompression has been disabled by the caller, See #22913

comment:14 dd3214 months ago

This may still deserve some more looking into, the only testing I've done is synthetic, and real-live servers/proxies may still behave badly.

comment:15 jondaley13 months ago

  • Cc jondaley added
  • Resolution set to fixed
  • Status changed from new to closed

I've run into this bug with SimplePress. I'm not sure when it broke - an upgrade of PHP or cURL? But, I've just spent an hour or two poking through simple press's code and then finally ending up in WP_HTTP.

I am not behind a proxy, but perhaps simplepress.com is?

The upgrade file is being streamed directly to a file, and since RETURNTRANSFER is set to true, curl_exec returns "1", so $theBody = , and so the decompress() function is run on empty data.

The problem is that when I request:
http://simple-press.com/download-manager.php?id=644&wpupdate=1
with WP_Http, I get a gzipped PKzip file!
If I use curl or wget on the same URL, I get the expected PKzipped file,
so I'm not sure where the extra gzip is happening (or maybe curl and wget are silently gunzipping it for me).

simplepress's .xml file is gunzipped appropriately (maybe it isn't being 'stream'ed, I didn't check, but it is downloaded fine, just the .zip files are left gzipped.

In any case, I've downloaded revision 23602 of class-http.php and replaced the version in wordpress 3.5.1 (and commented out the wp_is_writable to avoid an error).

And my download now works. Thanks!

comment:16 jondaley13 months ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

Whoops - I didn't mean to close it - the setting was grayed out, so I don't know how it got closed...

comment:17 dd3213 months ago

  • Resolution set to fixed
  • Status changed from reopened to closed

I'm going to mark this as fixed anyway, so thanks for testing it!
If there are any reports of breakage during the beta period, we can re-open it.

comment:18 DrewAPicture11 months ago

  • Keywords needs-codex added

[23601] adds the wp_http_accept_encoding filter and two new args to the accept_encoding() WP_Http method.

Note: See TracTickets for help on using tickets.