WordPress.org

Make WordPress Core

Opened 6 years ago

Closed 6 years ago

Last modified 6 years ago

#26010 closed defect (bug) (duplicate)

SSL via `WP_Http_Curl` breaks on HTTP version mismatch

Reported by: soulseekah Owned by:
Milestone: Priority: normal
Severity: minor Version: 3.7
Component: HTTP API Keywords:
Focuses: Cc:

Description

Requesting https://www.google.com on 3.6.1 works fine, while 3.7 and on breaks things.

Located breaking changeset to [25303]. Around line 1232 a cURL error is being caught and things bail out. In my case the error is CURLE_RECV_ERROR (56), the message is SSL read: error:00000000:lib(0):func(0):reason(0), errno 0 (it's irrelevant, the code is relevant).

Possibly #25716 is related. The error doesn't happen for most sites I tried:

https://github.com works
https://www.paypal.com works
https://wordpress.com works
https://www.sandbox.paypal.com doesn't work
https://www.google.de doesn't work
https://www.youtube.com doesn't work

Moreover, both stream_headers (http://core.trac.wordpress.org/browser/trunk/src/wp-includes/class-http.php?rev=25303#L1289) and stream_body (http://core.trac.wordpress.org/browser/trunk/src/wp-includes/class-http.php?rev=25303#L1303) contain the data of sites that "don't work". At both these points curl_errno returns 0 (curl_error returns and empty string). But after curl_exec (http://core.trac.wordpress.org/browser/trunk/src/wp-includes/class-http.php?rev=25303#L1235) the error is set.

Previously this error wasn't being caught. The body and the headers and everything else were available and ready to use. As of [25303] the error is caught and even though the data has been received, the whole thing bails out.

After a lot of digging and experiments I was able to reproduce the CURLE_RECV_ERROR (56) error by explicitly setting CURLOPT_HTTP_VERSION to 1.0 (the same behavior http://core.trac.wordpress.org/browser/trunk/src/wp-includes/class-http.php?rev=25303#L1209). Setting this to 1.0 for some sites results in the error being generated. When removing the option, setting it to 1.1 or setting it to CURL_HTTP_VERSION_NONE no errors occur.

PHP 5.5.5 (cli) (built: Oct 16 2013 05:59:03) --with-curl=shared
(x86_64-unknown-linux-gnu) libcurl/7.31.0 OpenSSL/1.0.1e zlib/1.2.8 libssh2/1.4.3

This might be broken behavior of the latest libraries, but is there any reason to not have cURL decide the version itself? By not setting the version now that that error is being caught?

Attachments (1)

26010.diff (1.2 KB) - added by soulseekah 6 years ago.

Download all attachments as: .zip

Change History (11)

#1 @kovshenin
6 years ago

  • Cc kovshenin added

#2 @dd32
6 years ago

This was caused by a cURL regression in 7.31.0, basically it's impossible to tell with 100% certainty from PHP if it was a Network Error, Invalid SSL data, injected-from-a-3rd-party SSL data, or, the cURL regression which treated a specific SSL error as critical. We can check to see if there was data received as you've done, but we can't verify it in all circumstances.

Certain servers violate the TLS protocol (It closes the connection before sending the TLS closing handshake, nginx does this in certain cases) which causes this error to occur. In the case of nginx, it's only triggered when it's a POST request with a body specified.

cURL 7.30.0, and cURL 7.32.0 do not have this issue, and handle the error internally to detect the TLS violation error and ignore it in the right circumstances - We can't do that as cURL doesn't expose the right error messages for it.

The breaking change in [25303] is that we handle connection failures during receive now, where as before we didn't. This is important in that we don't return partial documents as a successful retrieval, previously if there was a network error or timeout it was possible to receive only 20% of the full document under the right circumstances.

Ultimately, I think it's best to simply disable HTTPS requests via the broken version of cURL (see ticket below) for simplicity and reliability.

See also: #25716 - WordPress.org had the same trouble
See also: #25738 - Blacklisting of certain incompatible versions of curl (including this one)

#3 follow-ups: @soulseekah
6 years ago

curl --http1.0 https://www.google.de results in curl: (56) SSL read: error:00000000:lib(0):func(0):reason(0), errno 0. This is what WordPress is doing by default, setting all connections to forced HTTP 1.0 version.

In my pursuit to find out what exactly is happening here I've found the following (to confirm @dd32):

This does not happen in libcurl/7.30.0.
This does happen in libcurl/7.31.0.
This does not happen in libcurl/7.32.0.

After a "quick" forward and reverse bisect, I've located the first faulty commit to https://github.com/bagder/curl/commit/520833cbe1601feed1c6473bd28c4c894e7ee63e, and fixes were introduced later in https://github.com/bagder/curl/commit/8a7a277c086199b37c07a8e01165168037866f3e that got rid of the version issue.

When we recently started to treat a zero return code from SSL_read() as an error we also got false positives - which primarily looks to be because the OpenSSL documentation is wrong and a zero return code is not at all an error case in many situations.

http://curl.haxx.se/bug/view.cgi?id=1249 (down at the moment).

So, overall things are quite clear on why 7.31.0 fails. I just happened to independently discover one reproducible false positive coming from inside OpenSSL.

I agree that 7.31.0 is potentially broken, but in most cases it works; so can we come up with something that allows us to gracefully accept some errors under some circumstances? Perhaps we can isolate the exact false positives that have been encountered for this version?

And, more importantly, why is WordPress setting CURLOPT_HTTP_VERSION to 1.0? Seems like a non-sane default. Why not let cURL decide, and allow overrides via the httpversion argument as usual, if set?

Last edited 6 years ago by soulseekah (previous) (diff)

@soulseekah
6 years ago

#4 @soulseekah
6 years ago

The proposed diff is a draft idea, it will take much more than that as the default trickles down from Wp_Http into all implementations.

#5 in reply to: ↑ 3 ; follow-ups: @dd32
6 years ago

Replying to soulseekah:

And, more importantly, why is WordPress setting CURLOPT_HTTP_VERSION to 1.0? Seems like a non-sane default. Why not let cURL decide, and allow overrides via the httpversion argument as usual, if set?

WP_HTTP is/was a HTTP/1.0 client, it was designed to have a consistent input/output no matter what transport was used (there used to be 5 of them). While it now supports HTTP/1.1 mostly, it's still defaulting to 1.0 for nothing other than "all 1.1 servers must support 1.0".

A lot of the reasoning for the failures were covered in #25716, and yes, that was the cURL change responsible.

The root cause is due to the way that TLS spec (SSLv3) works, Under SSLv2 the following was the process a server would do:
(ok, there'll be a few back and forwards missing here)

  1. Client connects, asks for SSL
  2. Server responds with SSL handshake ack
  3. Client sends request
  4. Server sends data encrypted
  5. Server waits for another request, or closes the connection

Under SSLv3 (TLS), the 5th step changes:

  1. Server checks to see if client is going to send extra data, if so, back to step 2.
  2. Client Initiates SSL Shutdown handshake
  3. Server responds with SSL Shutdown handshake
  4. Connection Terminated.

The problem is, that many servers out there do not respect the 6/7th set, and as a result, OpenSSL clients will generally not call a Shutdown-not-completed error as a critical error, and will continue to return the data.
In cURL 7.31.0 that was changed, and it treated any error as fatal. In addition to that, GnuTLS (a SSLv3-only client, which cURL can be compiled with instead of OpenSSL) doesn't like servers which violate that spec and fails those requests too.

By changing the HTTP version you're requesting, you're triggering the servers to do something slightly different, in the case of nginx for example, HTTPS GET requests, and HTTPS POST requests without a POST body complete successfully with no changes, however, if you send a POST body it triggers a different branch of code which closes the connection early before it finalises the SSL shutdown handshake (If you run nginx in debug mode, you'll even say it saying so in the logs), to get around that, WordPress.org disabled the lingering_close directive to cause nginx not to close the connection prematurely.

Ultimately, we can change core slightly to avoid these issues in some cases by changing the request method, at the risk of being less flexible or not working with other servers, OR, we can simply black-list certain versions and not make requests over them, which seems safer #25738

#6 in reply to: ↑ 3 @dd32
6 years ago

Replying to soulseekah:

curl --http1.0 https://www.google.de results in curl: (56) SSL read: error:00000000:lib(0):func(0):reason(0), errno 0. This is what WordPress is doing by default, setting all connections to forced HTTP 1.0 version.

Also, FWIW, that command will succeed on a non-affected version of cURL, as changing the HTTP version changes the code it hits on the server side, so it's really one and the same problem.

#7 @soulseekah
6 years ago

Thanks for your input @dd32. My apologies for going offtopic, but I'm trying to understand what happened to get to the bottom of things:

http://www.openssl.org/docs/ssl/SSL_read.html return code 0:

The read operation was not successful. The reason may either be a clean shutdown due to a close notify alert sent by the peer (in which case the SSL_RECEIVED_SHUTDOWN flag in the ssl shutdown state is set (see SSL_shutdown(3), SSL_set_shutdown(3)). It is also possible, that the peer simply shut down the underlying transport and the shutdown is incomplete. Call SSL_get_error() with the return value ret to find out, whether an error occurred or the connection was shut down cleanly (SSL_ERROR_ZERO_RETURN).

SSLv2 (deprecated) does not support a shutdown alert protocol, so it can only be detected, whether the underlying connection was closed. It cannot be checked, whether the closure was initiated by the peer or by something else.

Seems clear, but then this is confusing:

https://github.com/bagder/curl/commit/8a7a277c086199b37c07a8e01165168037866f3e:

...because the OpenSSL documentation is wrong and a zero return code is not at all an error case in many situations.

That documentation changed last 5 years ago https://github.com/openssl/openssl/commits/master/doc/ssl/SSL_read.pod and seems to be correct. It clearly states that 0 is not always right and that further checks are due.

Emphasizing:

Call SSL_get_error() with the return value ret to find out, whether an error occurred or the connection was shut down cleanly (SSL_ERROR_ZERO_RETURN).

So what do the cURL developers do overall? Initially they don't consider 0 to be an error at all, things seem to work fine. Then they consider 0 to be an error and all hell breaks loose, because sometimes SSL_get_error returned uncaught errors (beyond SSL_ERROR_NONE, SSL_ERROR_ZERO_RETURN, SSL_ERROR_WANT_READ and SSL_ERROR_WANT_WRITE).

In https://github.com/bagder/curl/blob/8a7a277c086199b37c07a8e01165168037866f3e/lib/ssluse.c#L2595 they introduce further checks by calling ERR_get_error (http://www.openssl.org/docs/crypto/ERR_get_error.html) which:

...returns the earliest error code from the thread's error queue and removes the entry.

I'm a bit confused now, though. Since the documentation appears to be right all along, did the cURL developers misaccuse and blame OpenSSL for their fiasco? Why are the developers relying on earlier values of the error queue for a request? Is this reliable?

Probably wrong place to ask. Ticket can be closed.

Last edited 6 years ago by soulseekah (previous) (diff)

#8 in reply to: ↑ 5 @soulseekah
6 years ago

Replying to dd32:

The problem is, that many servers out there do not respect the 6/7th set, and as a result, OpenSSL clients will generally not call a Shutdown-not-completed error as a critical error, and will continue to return the data.
In cURL 7.31.0 that was changed, and it treated any error as fatal. In addition to that, GnuTLS (a SSLv3-only client, which cURL can be compiled with instead of OpenSSL) doesn't like servers which violate that spec and fails those requests too.

So Google's servers (https://www.google.de, https://www.youtube.com) are actually violating SSL shutdown procedures under HTTP/1.0?

I can see slight variations in the closing traces (--trace) when requesting HTTP/1.0 versions, but the handshake is identical.

HTTP/1.1

== Info: Connection #0 to host www.youtube.com left intact

HTTP/1.0

== Info: SSL read: error:00000000:lib(0):func(0):reason(0), errno 0
== Info: Closing connection 0
== Info: SSLv3, TLS alert, Client hello (1):
=> Send SSL data, 2 bytes (0x2)
0000: 01 00                                           ..

I can't see the shutdown traces. Why is the connection left intact in HTTP/1.1? In later versions of cURL both HTTP/1.1 and HTTP/1.0 requests result in:

== Info: Closing connection 0
== Info: SSLv3, TLS alert, Client hello (1):
=> Send SSL data, 2 bytes (0x2)
0000: 01 00                                           ..

Any ideas? Was/is cURL violating shutdown procedures and not closing the connection? What happened when it did, isn't Google responding correctly?

Github (which doesn't error out on 7.31.0 via HTTP/1.0) produces an identical trace:

== Info: Closing connection 0
== Info: SSLv3, TLS alert, Client hello (1):
=> Send SSL data, 2 bytes (0x2)
0000: 01 00                                           ..

Still puzzled a bit. Is that a close_notify from the client?

#9 @dd32
6 years ago

  • Milestone Awaiting Review deleted
  • Resolution set to duplicate
  • Status changed from new to closed

As far as I can tell, OpenSSL generates an error in the event of the SSL handshake not being completed upon shutdown, however, it treats that error slightly different to a standard error. That's what the cURL change was.

As for not seeing it with --trace, I can't explain that (as I don't know how it works), I can only confirm it from looking at the server debug logs and triggering the conditions from the server (ie. forcefully closing the connection before SSL shutdown, and not triggering it by delaying connection close until the SSL shutdown takes place).

So ultimately, I understand the basics of what caused it, but I'm not a specialist in OpenSSL/cURL so I can't really verify everything. I just understand the circumstances where WordPress failed to communicate with api.wordpress.org, and managed to track down the changesets and understand the problem.

FWIW, closing this ticket as a duplicate of #25738 for fixing, but that doesn't mean you can't reply here.

#10 in reply to: ↑ 5 @iandunn
6 years ago

Replying to dd32:

Replying to soulseekah:

And, more importantly, why is WordPress setting CURLOPT_HTTP_VERSION to 1.0? Seems like a non-sane default. Why not let cURL decide, and allow overrides via the httpversion argument as usual, if set?

WP_HTTP is/was a HTTP/1.0 client, it was designed to have a consistent input/output no matter what transport was used (there used to be 5 of them). While it now supports HTTP/1.1 mostly, it's still defaulting to 1.0 for nothing other than "all 1.1 servers must support 1.0".

FWIW, PayPal is now requiring HTTP 1.1 requests on their APIs and returning 403 Forbidden codes for any 1.0 requests.

Note: See TracTickets for help on using tickets.