#16855 closed defect (bug) (fixed)
HTTP API No Follow Redirection
Reported by: | TheDeadMedic | Owned by: | dd32 |
---|---|---|---|
Milestone: | 3.2 | Priority: | normal |
Severity: | normal | Version: | 3.1 |
Component: | HTTP API | Keywords: | |
Focuses: | Cc: |
Description
As highlighted in the following discussions, there's an issue when we don't want to follow redirects. On closer inspection, I believe only the cURL component is affected.
http://wordpress.stackexchange.com/questions/11951/wp-remote-get-with-manual-redirect/11964#11964
Currently, WP_Http_Curl
will throw a WP_Error
if a redirection header was recieved, and the redirection
argument is zero.
Maximum redirects followed (0)
The FOLLOWLOCATION option must be false, otherwise cURL treats the result as an error.
if ( !$r['redirection'] ) curl_setopt( $handle, CURLOPT_FOLLOWLOCATION, false );
As I say, as far as I know only cURL component is affected.
Attachments (4)
Change History (42)
#5
@
14 years ago
In addition, this url: http://news.zdnet.de/story/ is not handled correctly.
It returns a time-out but that is because it gets a 301 that links to itself, though as you can see by typing it in in the browser the end page is normal page.
#6
@
14 years ago
http://wallpapers.protonet.ru/section/foto/O/
doesn't set a message, that should still be handled by the http class.
HTTP/1.1 404 Server: nginx/0.6.32
http://news.zdnet.de/story/ seems to redirect twice to the final page, yet still works for me using wp_remote_get();
#8
@
14 years ago
On the /foto/O/ one, something like this would be nicer:
$header_parts = count(explode(' ', $tempheader));
if ( false === strpos($tempheader, ':') ) {
if($header_parts == 3) {
list( , $response['code'],$response['message']) = explode(' ', $tempheader, 3);
}
continue;
}
(when PHP error notices are on Xdbebug displays 2 A4's of error stack trace otherwise.)
#9
@
14 years ago
On ZDNet story: yes if redirection is not set to 0 you get the output.
If you set redirection to 0 ... i probably need to resend the cookies from the intermediate requests.
#10
@
14 years ago
In order to keep this ticket on the topic explained in the description, I've opened a new ticket for the warnings: #16885
#11
@
14 years ago
I can understand what you're looking for but I have problems to understand.
The meanin of a sentence like "When we don't want to follow redirects" is pretty broad. Do I want to prevent that the HTTP compontent is performing redirects? Then the error returning message is quite right.
If I only want to get the response headers of a request but not the HTTP component following any redirects, then this might need an additional switch, like do-not-redirect = true
or get-response = true
.
But settings the maximum redirects switch to 0 must not mean that the component genreally should not follow redirects at all.
The defnition of that redirection switch is inside the code:
'redirection' is used to track how many redirects were taken and used to sent the amount for other transports, but not all transports accept setting that value.
That's for the general WP_Http class. As this ticket is about cUrl, it's important to say that the cUrl imlementation lacks of any additional information.
It's known from other tickets that cUrl supports redirection and that the redirection parameter is to be supported but subject to interpretation. It was roughly fixed once. See #11305 .
The default redirection
value is 5.
That means that a request will automatically perform a redirection up to 5 times before giving an error which means up to 6 HTTP requests, of which the last 5 ones are triggered by a 3xx redirect response and a related location header.
Becase an automatic redirect is by definition a response with a 3xx status code and a location header.
Some wordpress HTTP implementations will process any response as an automatic redirect only if they have a location header and regardless of the HTTP response code (see #16889) but this problem is out of the scope of this ticket so I would prefer for clarification purposes to stick to the HTTP RFC definition of an automatic redirect while writing about it.
So I think the discussion of this issue will benefit of some clarification what should be achieved and how.
As far as I can see, the redirection argument (flag, switch whatever it is called) does only limit the maximum number of redirects that will be performed before returning WP_Error.
It is totally inappropriate to prevent redirects and only perform the request w/o automatic handling.
Instead of setting that value to 0 (which is highly undefined, especially with the hotfix for that argument in [12745]) you must instead propery interfere with cUrl request context options and set your suggested value manually if you don't want to let WP_Http_Curl follow redirect locations:
curl_setopt( $handle, CURLOPT_FOLLOWLOCATION, false );
So this is different to the scope of the current redirection argument, at least it can be different due to different opinions and the missing specification.
However for what you ask for I would suggest that we add another argument, namely followlocation (like curl has) and implement that.
So we don't mix things with number of redirects and enable users of the API to do requests w/o performing automatic redirects.
#13
@
14 years ago
Just for reference, making use of that component and setting it to true can give warnings: Warning: curl_setopt() [function.curl-setopt]: CURLOPT_FOLLOWLOCATION cannot be activated when in safe_mode or an open_basedir is set ...
#14
@
14 years ago
With the new patch you can just pass the 'followlocation'
argument and setting it to false
which will the cUrl transport prevent following any location headers. The according libcurl option is CURLOPT_FOLLOWLOCATION.
#15
@
14 years ago
'followlocation' as well for WP HTTP Stream Transport
Version info: since PHP 5.3.4
#17
follow-up:
↓ 22
@
14 years ago
I'm not sure I like this idea. We should key off of the 'redirection' argument instead of introducing another argument. If redirection > 0 we should assume that we want to follow the location. We should be able to move the logic of determining whether or not we can follow out of an argument and use it closer to where we have the potential of running into issues.
#18
@
14 years ago
i will check the patches and test the uri http://www.informationweek.com/story/
update: setting it to false did give the same message lets see about that CURLOPT_FOLLOWLOCATION in a separate standalone file (http://pastebin.com/wj32yNrv) which does work so probably i patched wrongly GRIN
#19
@
14 years ago
p.s. I notice this "curl_setopt( $handle, CURLOPT_SSL_VERIFYHOST, $ssl_verify );"
According to specs (http://php.net/manual/en/function.curl-setopt.php) this should be an integer not a bool (# 1 to check the existence of a common name in the SSL peer certificate. 2 to check the existence of a common name and also verify that it matches the hostname provided. ) so <> the bool value that goes in SSL_VERIFYPEER. (as stated on http://codex.wordpress.org/HTTP_API), it are 2 different ones.
#20
@
14 years ago
I understand the difference now:
- you send a head request it returns "Response: HTTP/1.1 200 OK Date: Sun, 20 Mar 2011 20:49:41 GMT Server: Apache Cache-Control: no-cache Expires: Thu, 4 Jan 1990 10:00:01 GMT Last-Modified: Tue, Jan 27 2099 23:59:59 GMT Pragma: no-cache X-ATG-Version: ATGPlatform/7.2 [ DASLicense/0 DPSLicense/0 ] Set-Cookie: JSESSIONID=PNZORWDSL2JJHQE1GHPCKH4ATMY32JVN; path=/ Content-Length: 0 Connection: close Content-Type: text/html"
- so you think it is ok and send a get request: it now returns an empty string as request but the CURLINFO_HTTP_CODE contains a 301/302 So... it falls into:
if ( in_array( curl_getinfo( $handle, CURLINFO_HTTP_CODE ), array(301, 302) ) ) echo 'http_request_failed';wp_die(); return new WP_Error('http_request_failed', __('Too many redirects.'));
- However ... when I do a direct get request with my test script I get: HTTP/1.1 302 Moved Temporarily Date: Sun, 20 Mar 2011 21:14:36 GMT Server: Apache Cache-Control: no-cache Expires: Thu, 4 Jan 1990 10:00:01 GMT Last-Modified: Tue, Jan 27 2099 23:59:59 GMT Pragma: no-cache X-ATG-Version: ATGPlatform/7.2 [ DASLicense/0 DPSLicense/0 ] Set-Cookie: JSESSIONID=TK21PGAJSFZPRQE1GHRSKH4ATMY32JVN; path=/ Location: https://login.techweb.com/cas/login?service=http%3A//www.informationweek.com/nopagefound.jhtml%3Bjsessionid%3DTK21PGAJSFZPRQE1GHRSKH4ATMY32JVN&gateway=true Connection: close Content-Type: text/html Vary: Accept-Encoding, User-Agent
Which contains the correct forwarding URI (so will never jump in that part of not having no return value).
It is in:
// The option doesn't work with safe mode or when open_basedir is set. // Disable HEAD when making HEAD requests. if ( !ini_get('safe_mode') && !ini_get('open_basedir') && 'HEAD' != $r['method'] ) { //curl_setopt( $handle, CURLOPT_FOLLOWLOCATION, true ); echo "ah!"; }
If I comment that out all = ok , f.y.i. my PHP Info:
safe_mode Off Off open_basedir no value no value
I probably missed that line in the patching. Anyway the ticket is immortalized in http://plugins.trac.wordpress.org/browser/wp-favicons/trunk/includes/class-http.php (line 380) :) thanks.
#22
in reply to:
↑ 17
@
14 years ago
Replying to sivel:
I'm not sure I like this idea. We should key off of the 'redirection' argument instead of introducing another argument. If redirection > 0 we should assume that we want to follow the location. We should be able to move the logic of determining whether or not we can follow out of an argument and use it closer to where we have the potential of running into issues.
Give your thoughts a go and provide a patch so we can test how it feels.
Some info on the current 'redirection' use: -1 is no limit on redirects for the !cUrl transport. For streams below one means that no redirects are followed. No idea if streams give an error on that one, cUrl does.
#23
@
14 years ago
The discussion here is related to #16888, the followlocation parameter/flag can help to keep things apart. The concept is borrowed from cUrl itself which has two settings as well (both sharing some scope): follow location and max redirects.
Probably it's useful to have a redirection parameter (int), then a follow-location parameter (bool) and a max-redirects (int) parameter.
The redirection paramter will set both as needed, and setting any of the others can override for fine-control. Transports internals can rely to the two more specific parameters if they need a more fine-grained control.
(Informative: Next to those are two more redirection related curl settings: post redir, redir protocols.)
#24
@
14 years ago
I compiled some of the related HTTP status code / Location header field information and made an overview table: HTTP Redirect Codes (3xx) and the Location Field. This helped me to learn a little bit more about the subject.
#25
@
14 years ago
Can we have a new action (so a single line added) :
do_action( 'http_api_curl_info', $handle);
directly after calling the curl_exec ?
This would provide the url e.g. with http://su.pr/16IDwy:
Array ( [url] => http://www.eurogamer.net/articles/digitalfoundry-red-dead-redemption-time-lapse [content_type] => text/html; charset=ISO-8859-1 [http_code] => 200 [header_size] => 879 [request_size] => 384 [filetime] => -1 [ssl_verify_result] => 0 [redirect_count] => 1 [total_time] => 0.811 [namelookup_time] => 0.015 [connect_time] => 0.031 [pretransfer_time] => 0.031 [size_upload] => 0 [size_download] => 0 [speed_download] => 0 [speed_upload] => 0 [download_content_length] => 37778 [upload_content_length] => 0 [starttransfer_time] => 0.483 [redirect_time] => 0.328 [certinfo] => Array ( ) )
which is pretty handy.
#26
follow-up:
↓ 27
@
14 years ago
In general, we want to give every transport the same functionality. If we add a one off feature to the cURL transport, than we need to backport the functionality to all transports.
We should always strive to keep the functionality as similar as possible between all transports, because it is difficult to dictate which transport will be used from environment to environment, which is the whole point of the HTTP API anyway.
#27
in reply to:
↑ 26
;
follow-up:
↓ 28
@
14 years ago
Ok. then add them to the other transports to?
Maybe my context first: how can i get the favicon of 'http://su.pr/16IDwy' ?
Answer: I need the URL where it forwards to, to determine the base url
1st possibility: put redirect to 0 and + write code to manually redirect and do manually all the stuff that among others curl does. Did not work in 1/3 of the cases. This is where this ticket came in.
So therefore my 2nd possibility: add additional to the existing "do_action_ref_array( 'http_api_curl', array(&$handle) ); " one extra call to provide the information. In that way I can grab the url and know where it forwards to. I think that is the purpose of that field that curl returns grin. Also in that way users of WordPress can write plugins to e.g. describe where a shortlink heads to before clicking it.
At this moment, since I can not overwrite existing classes and I dont want to include a complete new class including http.php and class-http.php in a new namespace, I am thinking to use http_api_curl, then set the curl parameters, do the requests and send an empty url back and forget about what it returns grin. But that seems not good but seems to be the only way with the current production release.
#28
in reply to:
↑ 27
@
14 years ago
Replying to cogmios:
Ok. then add them to the other transports to?
Feel free to open another ticket. You are hijacking this ticket with something that is different that what we are trying to accomplish here.
#29
@
14 years ago
Ok... another ticket grin...
1) I jumped in here: http://groups.google.com/group/wp-hackers/browse_thread/thread/19dc7c7197f66754/9a60ad0b37ccaf1b?show_docid=9a60ad0b37ccaf1b&pli=1 based on that one (my other name is edward de leau)
2) Then I started the topic here: http://wordpress.stackexchange.com/questions/11951/wp-remote-get-with-manual-redirect/11964#11964 (edelwater is my other NIC)
3) Whereafter the deadmedic created this ticket.
Both with this case: "to be able to see the url where it forwards to" (which curl provides but is not passed along) <--- workaround: "set redirection to 0" (cant think of another reason why you would want this) <--- curl problem needs another parameter ---> my workaround: provide the url in another way
So line of the ticket "As highlighted in the following discussions, there's an issue when we don't want to follow redirects." ---> I think the only reason why someone does not want to follow redirects (as 1/3 is not possible in the current situation because it gives the 301/302 trap) --> because he needs to see where he is heading to see also the initial topic on http://groups.google.com/group/wp-hackers/browse_thread/thread/19dc7c7197f66754/9a60ad0b37ccaf1b?show_docid=9a60ad0b37ccaf1b&pli=1 , same reason.
But: https://core.trac.wordpress.org/ticket/16950 new ticket reference.
#30
@
14 years ago
I have tested by specifying redirection => 0 and fsockopen seems to be the only one that throws a WP_Error. All other transports return the 302 found, with the headers specifying the location.
#31
@
14 years ago
update: here : http://ijskast.com/fail.sql are some uri's that fail on my server unfortunately i did not log the transport against it but probably Curl.
on my localhost/127.0.0.1 () more work with Curl.
(so all with redirect=0)
(the difference being that on the server there is something filled in for open basedir) (which is an additional other problem to tackle...)
#32
@
14 years ago
It seems the difference here is between a 301 and a 302 redirect. With 302 redirects the only transport that throws a WP_Error is fsockopen. With 301, fsockopen and cURL will both throw a WP_Error.
#33
@
14 years ago
I changed the 11305 but to this:
if ( $r['redirection'] > 0 && !empty($theHeaders['headers']['location']) && (ini_get('safe_mode') || ini_get('open_basedir')) ) { if ( $r['redirection']-- > 0 ) { return $this->request($theHeaders['headers']['location'], $r); } else { return new WP_Error('http_request_failed', __('Too many redirects.')); } }
Otherwise [ ' redirection ' ] == 0 so it will always give this error.
This looks good on the server as the first 750 uri's do not show 'Too many redirects'
For the situation 'safe mode' / 'open base dir' + Curl + ssl verify off: my run on the first 750 (running) :
<url> malformed: 11 Bad Request: 5 couldn't connect to host: 4 Empty reply from server: 2 Forbidden: 5 Not Found: 62 OK: 702
the malformed ones are cause by a before-bug.
#34
@
14 years ago
I've been working up a set of Unit tests for HTTP related to redirects the last few days, and the cases you're finding here are one of many where redirections are not handled a like between transports.
The current output looks like:
http Ext Streams cURL fsockopen SSSSSSSSS, ..F...F.F, ..FFF.FF., ..F...F.. SSSSSSSSS, .F....FFF, ..FFF.FF., ..F...F.. (safe_mode) S = skip; F = Failed test; . = Passed test
#36
@
14 years ago
You can find the current unit tests for the WP_HTTP class here: http://unit-tests.svn.wordpress.org/wp-testcase/test_http.php
Unit test results before:
......F........FF..F...F........FFF......FF..F...F..
Unit test results after: (Tested with, and without safe_mode)
....................................................
#38
@
14 years ago
Here is an update with a testset of 31.000 uri's I did on the same config as above:
Error Messages or Blank:
(empty) 873 Empty reply from server 18 <url> malformed 321 (probably my fault) couldn't connect to host 153 Couldn't resolve host (... with name) : estimate: 25
Error Codes:
0 8 () 200 18509 (OK) 300 2 (Multiple Choices) 301 5 (Moved Permanently) 304 2 (Not Modified) (huh?) 400 64 (Bad Request) 401 7 (Unauthorized) 403 166 (Forbidden) 404 2106 (Not Found) (various) 410 27 (Gone) 423 2 (Locked) (HUH?) 500 52 (Internal Server Error) 501 2 (Not Implemented) 502 11 (Bad Gateway) 503 8 (Service Unavailable) 504 2 (Gateway Timeout)
So looks good. Have to check some of them for additional issues.
In addition: this url: http://wallpapers.protonet.ru/section/foto/O/
gives me a "Undefined offset: 2 in \wp-includes\class-http.php on line <i>442</i>" with a wp_remote_get