Make WordPress Core

Opened 8 years ago

Closed 7 years ago

#36280 closed defect (bug) (wontfix)

HTTP API (Streams backend) secured proxy tunnels and chunked and zlib on the fly streaming

Reported by: bobbywalters's profile bobbywalters Owned by:
Milestone: Priority: normal
Severity: normal Version:
Component: HTTP API Keywords:
Focuses: Cc:

Description

The purpose of this ticket is to address concerns within the WP_Http_Streams class (wp-includes/class-wp-http-streams.php). And supply a patch to correct the defects, improve performance, and extend the available features as noted below.

Proxy Related

  • HTTPS requests through a proxy now create a properly secured tunnel with certificate validation possible. No longer have to use the Curl backend to properly tunnel HTTPS requests through the WP Remote API. No more 501 Not Implemented errors.
  • Created filter 'proxy_tunnel' to alter which connections may be tunneled through a proxy using HTTP CONNECT.

SSL/TLS Related

  • Added filter 'stream_crypto_method' to decide which cryptography (crypto) method is used for handshake. Default is STREAM_CRYPTO_METHOD_TLS_CLIENT.
  • Added 'peer_name' 'ssl' context parameter to enable peer name validation even over tunneled proxy connections.
  • Added 'verify_peer_name' which is set based on $ssl_verify in the event verification should be disabled.

Transfer-Encoding: chunked

  • Using the PHP builtin stream filter 'dechunk' to process response bodies transferred in a chunked fashion.
  • Much faster streaming capability avoids having to read in the entire response and then perform logic to distinguish chunks (ie WP_Http::chunkTransferDecode() is not used).
  • The stream filter also corrects some of the handling found in WP_Http::chunkTransferDecode() that prevented https://www.google.com from correctly being read. As a side note: I'm not sure what was throwing chunkTransferDecode off when reading https://www.google.com but it may have been the multibyte gzip data along with the 0 padded chunk sizes at the beginning. I didn't pursue trying to alter chunkTransferDecode since the 'dechunk' stream filter preforms the reads correctly.

Content-Encoding: (deflate | gzip)

  • Any content encoding handled by zlib is now done with the stream filter 'zlib.inflate' to handle on the fly decompression.
  • This provides performance and resource utilization improvements especially when the 'limit_response_size' option is set.

General

  • All options ('stream', 'decompress', 'limit_response_size', etc) are still honored.
  • Reading responses with none, one, or both of 'Transfer-Encoding' and 'Content-Encoding' work on the fly and streaming.
  • Both HTTP and HTTPS connections with and without a proxy involved work now.
  • Redirects continue to work with the improved proxy tunneling even while switching between HTTP to HTTPS. As an example http://www.wordpress.org performs a few redirects to drop the "www" and get over to https.

Testing

  • Using a local instance of WordPress with a combination of WP-CLI eval-file scenarios along with clearing out the transients and using the admin console to search, view details, and download plugins.
  • The previous bullet with a local Squid instance as setup out of the box Debian Jessie style. A simple Dockerfile was to get Squid running which could be shared. wp-config.php was modified to define WP_PROXY_HOST and WP_PROXY_PORT to use the local Squid instance.
  • As of right now I have not run the phpunit tests because I've never attempted to run them in WordPress. However, looking over https://develop.svn.wordpress.org/trunk/tests/phpunit/tests/http/base.php I don't immediately see anything that would be a problem.
  • I could see some additional test cases being useful around HTTP SSL/TLS, chunked, and gzip handling. Then all of that with a proxy in play but I'm not sure how feasible proxy testing is.

I may have skipped over some of the finer points but that should be the majority of the changes and rationale behind them. Thanks in advance for looking over this information and the related code changes.

Attachments (1)

36280.patch (18.8 KB) - added by bobbywalters 8 years ago.
First patch.

Download all attachments as: .zip

Change History (6)

@bobbywalters
8 years ago

First patch.

#1 follow-up: @dd32
8 years ago

Two initial things that stand out:

Using the PHP builtin stream filter 'dechunk'

That's available in PHP 5.3 and later, as we're still supporting PHP 5.2 that should have a feature check first.

Any content encoding handled by zlib is now done with the stream filter 'zlib.inflate' to handle on the fly decompression.

Not always available, and once again needs to have a capability check on it.

Theoretically servers shouldn't respond with Chunked encoding if we don't ask for it, nor should they supply compressed content we haven't specified we can understand, so both of these shouldn't be an issue.

Note that #33055 is a proposal (which I believe most of us are behind) to deprecate and remove WP_HTTP in preference for a more fleshed out library.

#2 in reply to: ↑ 1 @bobbywalters
8 years ago

Replying to dd32:

Two initial things that stand out:

Using the PHP builtin stream filter 'dechunk'

That's available in PHP 5.3 and later, as we're still supporting PHP 5.2 that should have a feature check first.

Any content encoding handled by zlib is now done with the stream filter 'zlib.inflate' to handle on the fly decompression.

Not always available, and once again needs to have a capability check on it.

Adding in feature detection for the 'dechunk' and 'zlib.inflate' stream filters, creating custom filters, and registering those instead appears possible in order to maintain support with PHP 5.2. It seems many of the stream_* functions were added in PHP 5. stream_notification_callback was added in 5.2 but that should be allowed and would enable a path to non-blocking IO and multiple simultaneous requests.

Theoretically servers shouldn't respond with Chunked encoding if we don't ask for it, nor should they supply compressed content we haven't specified we can understand, so both of these shouldn't be an issue.

Yeah, I was testing requests with both HTTP/1.0 and HTTP/1.1 and other permutations with Accept-Encoding. The servers I was using in my testing all sent the correct data in the proper format as you mentioned most will.

Note that #33055 is a proposal (which I believe most of us are behind) to deprecate and remove WP_HTTP in preference for a more fleshed out library.

This is the first time I've come across the Requests library. I quickly read over the code in GitHub so it's very possible I'm not seeing everything. It looks like a lot of the same code currently in WP_HTTP when going down the "fsockopen" (stream) transport path. The proxy, chunked, and decode handling all look very similar and would be missing the benefits of the patch in this ticket. Not to say Requests couldn't be updated with these proposed changes though.

Since Guzzle was brought up, I'm not sure it handles tunneled proxy connections either. I haven't spent any time looking at their code but a couple of grep's aren't turning much up. So it's entirely possible I'm missing the HTTP CONNECT call for encrypted or open proxy connection tunneling. But, ultimately, like the other ticket discussed, getting Guzzle to run on PHP 5.2 would be a hefty task.

If you're open to it, I'd like to take a crack at adding in the 5.2 compatibility to these patched changes (php_user_filter seems straight forward). I think the streams API can be leveraged and still maintain PHP 5.2 compatibility. Parallel/Multiple requests and the filter/hook cleanup would follow. This code should be useful no matter what given Requests has roots from WP_HTTP.

#3 @dd32
8 years ago

Requests is the work of @rmccue mostly and while it may seem like it's taken inspiration from WP_HTTP, it's not based on it, licensing and all :)

I'm myself extremely hesitant to tell anyone to write new code for the WP_HTTP internals, as realistically I don't think we should continue working on it. It was once needed when the options were using cURL directly or fsockopen() but today we have the opportunity to instead leverage a library which other applications also do, pushing changes upstream where needed, improving it for everyone.
I don't think Requests supports what you want to do here yet, but I think it would be a far better place for the effort to be directed.

I don't know when, or if, Requests will be switched in for WP_HTTP, we've talked about it for a long(ish) time.

Last edited 8 years ago by dd32 (previous) (diff)

#4 @obenland
8 years ago

  • Version trunk deleted

#5 @dd32
7 years ago

  • Milestone Awaiting Review deleted
  • Resolution set to wontfix
  • Status changed from new to closed

As we've now switched to using Requests I'm marking this ticket as wontfix.

Individual enhancements which make sense for Requests should be submitted upstream for consideration. Please note that submitting a single ticket which modifies multiple things, such as this one, is bound to get ignored by most projects as the scope doesn't allow easy reviewing.

Note: See TracTickets for help on using tickets.