Make WordPress Core

Opened 13 years ago

Closed 13 years ago

Last modified 5 years ago

#16855 closed defect (bug) (fixed)

HTTP API No Follow Redirection

Reported by: thedeadmedic's profile TheDeadMedic Owned by: dd32's profile dd32
Milestone: 3.2 Priority: normal
Severity: normal Version: 3.1
Component: HTTP API Keywords:
Focuses: Cc:

Description

As highlighted in the following discussions, there's an issue when we don't want to follow redirects. On closer inspection, I believe only the cURL component is affected.

http://wordpress.stackexchange.com/questions/11951/wp-remote-get-with-manual-redirect/11964#11964

http://groups.google.com/group/wp-hackers/browse_thread/thread/19dc7c7197f66754/9a60ad0b37ccaf1b?show_docid=9a60ad0b37ccaf1b&pli=1

Currently, WP_Http_Curl will throw a WP_Error if a redirection header was recieved, and the redirection argument is zero.

Maximum redirects followed (0)

The FOLLOWLOCATION option must be false, otherwise cURL treats the result as an error.

if ( !$r['redirection'] )
    curl_setopt( $handle, CURLOPT_FOLLOWLOCATION, false );

As I say, as far as I know only cURL component is affected.

Attachments (4)

16855.patch (1.4 KB) - added by hakre 13 years ago.
introducting followlocation argument for WP_Http_Curl transport
16855.2.patch (1.5 KB) - added by hakre 13 years ago.
default value reflecting safemode and openbase_dir directive
16855 (2.2 KB) - added by hakre 13 years ago.
plus 'followlocation' for WP HTTP Stream Transport
16855.3.patch (2.2 KB) - added by hakre 13 years ago.
plus 'followlocation' for WP HTTP Stream Transport (for your reading pleasure)

Download all attachments as: .zip

Change History (42)

#1 @ocean90
13 years ago

  • Component changed from General to HTTP

#2 @cogmios
13 years ago

  • Cc deleau@… added

#3 @dd32
13 years ago

  • Owner set to dd32
  • Status changed from new to reviewing

#4 @cogmios
13 years ago

In addition: this url: http://wallpapers.protonet.ru/section/foto/O/

gives me a "Undefined offset: 2 in \wp-includes\class-http.php on line <i>442</i>" with a wp_remote_get

#5 @cogmios
13 years ago

deleted

Last edited 13 years ago by cogmios (previous) (diff)

#6 @dd32
13 years ago

http://wallpapers.protonet.ru/section/foto/O/
doesn't set a message, that should still be handled by the http class.

HTTP/1.1 404
Server: nginx/0.6.32

http://news.zdnet.de/story/ seems to redirect twice to the final page, yet still works for me using wp_remote_get();

#7 @cogmios
13 years ago

maybe its my transport

#8 @cogmios
13 years ago

On the /foto/O/ one, something like this would be nicer:

$header_parts = count(explode(' ', $tempheader));
if ( false === strpos($tempheader, ':') ) {
 if($header_parts == 3) {
  list( , $response['code'],$response['message']) = explode(' ', $tempheader, 3);
 }                              
 continue;
} 

(when PHP error notices are on Xdbebug displays 2 A4's of error stack trace otherwise.)

#9 @cogmios
13 years ago

On ZDNet story: yes if redirection is not set to 0 you get the output.
If you set redirection to 0 ... i probably need to resend the cookies from the intermediate requests.

#10 @dd32
13 years ago

In order to keep this ticket on the topic explained in the description, I've opened a new ticket for the warnings: #16885

#11 @hakre
13 years ago

I can understand what you're looking for but I have problems to understand.

The meanin of a sentence like "When we don't want to follow redirects" is pretty broad. Do I want to prevent that the HTTP compontent is performing redirects? Then the error returning message is quite right.

If I only want to get the response headers of a request but not the HTTP component following any redirects, then this might need an additional switch, like do-not-redirect = true or get-response = true.

But settings the maximum redirects switch to 0 must not mean that the component genreally should not follow redirects at all.

The defnition of that redirection switch is inside the code:

'redirection' is used to track how many redirects were taken and used to sent the amount for other transports, but not all transports accept setting that value.

That's for the general WP_Http class. As this ticket is about cUrl, it's important to say that the cUrl imlementation lacks of any additional information.

It's known from other tickets that cUrl supports redirection and that the redirection parameter is to be supported but subject to interpretation. It was roughly fixed once. See #11305 .

The default redirection value is 5.

That means that a request will automatically perform a redirection up to 5 times before giving an error which means up to 6 HTTP requests, of which the last 5 ones are triggered by a 3xx redirect response and a related location header.

Becase an automatic redirect is by definition a response with a 3xx status code and a location header.

Some wordpress HTTP implementations will process any response as an automatic redirect only if they have a location header and regardless of the HTTP response code (see #16889) but this problem is out of the scope of this ticket so I would prefer for clarification purposes to stick to the HTTP RFC definition of an automatic redirect while writing about it.

So I think the discussion of this issue will benefit of some clarification what should be achieved and how.

As far as I can see, the redirection argument (flag, switch whatever it is called) does only limit the maximum number of redirects that will be performed before returning WP_Error.

It is totally inappropriate to prevent redirects and only perform the request w/o automatic handling.

Instead of setting that value to 0 (which is highly undefined, especially with the hotfix for that argument in [12745]) you must instead propery interfere with cUrl request context options and set your suggested value manually if you don't want to let WP_Http_Curl follow redirect locations:

curl_setopt( $handle, CURLOPT_FOLLOWLOCATION, false );

So this is different to the scope of the current redirection argument, at least it can be different due to different opinions and the missing specification.

However for what you ask for I would suggest that we add another argument, namely followlocation (like curl has) and implement that.

So we don't mix things with number of redirects and enable users of the API to do requests w/o performing automatic redirects.

Last edited 13 years ago by hakre (previous) (diff)

@hakre
13 years ago

introducting followlocation argument for WP_Http_Curl transport

#13 @hakre
13 years ago

Just for reference, making use of that component and setting it to true can give warnings: Warning: curl_setopt() [function.curl-setopt]: CURLOPT_FOLLOWLOCATION cannot be activated when in safe_mode or an open_basedir is set ...

@hakre
13 years ago

default value reflecting safemode and openbase_dir directive

#14 @hakre
13 years ago

With the new patch you can just pass the 'followlocation' argument and setting it to false which will the cUrl transport prevent following any location headers. The according libcurl option is CURLOPT_FOLLOWLOCATION.

@hakre
13 years ago

plus 'followlocation' for WP HTTP Stream Transport

#15 @hakre
13 years ago

'followlocation' as well for WP HTTP Stream Transport

Version info: since PHP 5.3.4

@hakre
13 years ago

plus 'followlocation' for WP HTTP Stream Transport (for your reading pleasure)

#16 @hakre
13 years ago

Missed the filename extension last time.

#17 follow-up: @sivel
13 years ago

I'm not sure I like this idea. We should key off of the 'redirection' argument instead of introducing another argument. If redirection > 0 we should assume that we want to follow the location. We should be able to move the logic of determining whether or not we can follow out of an argument and use it closer to where we have the potential of running into issues.

#18 @cogmios
13 years ago

i will check the patches and test the uri http://www.informationweek.com/story/
update: setting it to false did give the same message lets see about that CURLOPT_FOLLOWLOCATION in a separate standalone file (http://pastebin.com/wj32yNrv) which does work so probably i patched wrongly GRIN

Last edited 13 years ago by cogmios (previous) (diff)

#19 @cogmios
13 years ago

p.s. I notice this "curl_setopt( $handle, CURLOPT_SSL_VERIFYHOST, $ssl_verify );"

According to specs (http://php.net/manual/en/function.curl-setopt.php) this should be an integer not a bool (# 1 to check the existence of a common name in the SSL peer certificate. 2 to check the existence of a common name and also verify that it matches the hostname provided. ) so <> the bool value that goes in SSL_VERIFYPEER. (as stated on http://codex.wordpress.org/HTTP_API), it are 2 different ones.

Last edited 13 years ago by cogmios (previous) (diff)

#20 @cogmios
13 years ago

I understand the difference now:

  1. you send a head request it returns "Response: HTTP/1.1 200 OK Date: Sun, 20 Mar 2011 20:49:41 GMT Server: Apache Cache-Control: no-cache Expires: Thu, 4 Jan 1990 10:00:01 GMT Last-Modified: Tue, Jan 27 2099 23:59:59 GMT Pragma: no-cache X-ATG-Version: ATGPlatform/7.2 [ DASLicense/0 DPSLicense/0 ] Set-Cookie: JSESSIONID=PNZORWDSL2JJHQE1GHPCKH4ATMY32JVN; path=/ Content-Length: 0 Connection: close Content-Type: text/html"
  1. so you think it is ok and send a get request: it now returns an empty string as request but the CURLINFO_HTTP_CODE contains a 301/302 So... it falls into:
    if ( in_array( curl_getinfo( $handle, CURLINFO_HTTP_CODE ), array(301, 302) ) )
    				echo 'http_request_failed';wp_die();
    				return new WP_Error('http_request_failed', __('Too many redirects.')); 
    
  1. However ... when I do a direct get request with my test script I get: HTTP/1.1 302 Moved Temporarily Date: Sun, 20 Mar 2011 21:14:36 GMT Server: Apache Cache-Control: no-cache Expires: Thu, 4 Jan 1990 10:00:01 GMT Last-Modified: Tue, Jan 27 2099 23:59:59 GMT Pragma: no-cache X-ATG-Version: ATGPlatform/7.2 [ DASLicense/0 DPSLicense/0 ] Set-Cookie: JSESSIONID=TK21PGAJSFZPRQE1GHRSKH4ATMY32JVN; path=/ Location: https://login.techweb.com/cas/login?service=http%3A//www.informationweek.com/nopagefound.jhtml%3Bjsessionid%3DTK21PGAJSFZPRQE1GHRSKH4ATMY32JVN&gateway=true Connection: close Content-Type: text/html Vary: Accept-Encoding, User-Agent

Which contains the correct forwarding URI (so will never jump in that part of not having no return value).

It is in:

// The option doesn't work with safe mode or when open_basedir is set.
		// Disable HEAD when making HEAD requests.
		if ( !ini_get('safe_mode') && !ini_get('open_basedir') && 'HEAD' != $r['method'] )
			{
			//curl_setopt( $handle, CURLOPT_FOLLOWLOCATION, true );
			echo "ah!";
			}

If I comment that out all = ok , f.y.i. my PHP Info:

safe_mode	Off	Off
open_basedir	no value	no value

I probably missed that line in the patching. Anyway the ticket is immortalized in http://plugins.trac.wordpress.org/browser/wp-favicons/trunk/includes/class-http.php (line 380) :) thanks.

Last edited 13 years ago by cogmios (previous) (diff)

#21 @cogmios
13 years ago

deleted

Last edited 13 years ago by cogmios (previous) (diff)

#22 in reply to: ↑ 17 @hakre
13 years ago

Replying to sivel:

I'm not sure I like this idea. We should key off of the 'redirection' argument instead of introducing another argument. If redirection > 0 we should assume that we want to follow the location. We should be able to move the logic of determining whether or not we can follow out of an argument and use it closer to where we have the potential of running into issues.

Give your thoughts a go and provide a patch so we can test how it feels.

Some info on the current 'redirection' use: -1 is no limit on redirects for the !cUrl transport. For streams below one means that no redirects are followed. No idea if streams give an error on that one, cUrl does.

#23 @hakre
13 years ago

The discussion here is related to #16888, the followlocation parameter/flag can help to keep things apart. The concept is borrowed from cUrl itself which has two settings as well (both sharing some scope): follow location and max redirects.

Probably it's useful to have a redirection parameter (int), then a follow-location parameter (bool) and a max-redirects (int) parameter.

The redirection paramter will set both as needed, and setting any of the others can override for fine-control. Transports internals can rely to the two more specific parameters if they need a more fine-grained control.

(Informative: Next to those are two more redirection related curl settings: post redir, redir protocols.)

#24 @hakre
13 years ago

I compiled some of the related HTTP status code / Location header field information and made an overview table: HTTP Redirect Codes (3xx) and the Location Field. This helped me to learn a little bit more about the subject.

#25 @cogmios
13 years ago

Can we have a new action (so a single line added) :

		do_action( 'http_api_curl_info', $handle);				

directly after calling the curl_exec ?

This would provide the url e.g. with http://su.pr/16IDwy:

Array
(
    [url] => http://www.eurogamer.net/articles/digitalfoundry-red-dead-redemption-time-lapse
    [content_type] => text/html; charset=ISO-8859-1
    [http_code] => 200
    [header_size] => 879
    [request_size] => 384
    [filetime] => -1
    [ssl_verify_result] => 0
    [redirect_count] => 1
    [total_time] => 0.811
    [namelookup_time] => 0.015
    [connect_time] => 0.031
    [pretransfer_time] => 0.031
    [size_upload] => 0
    [size_download] => 0
    [speed_download] => 0
    [speed_upload] => 0
    [download_content_length] => 37778
    [upload_content_length] => 0
    [starttransfer_time] => 0.483
    [redirect_time] => 0.328
    [certinfo] => Array
        (
        )
)

which is pretty handy.

Last edited 13 years ago by cogmios (previous) (diff)

#26 follow-up: @sivel
13 years ago

In general, we want to give every transport the same functionality. If we add a one off feature to the cURL transport, than we need to backport the functionality to all transports.

We should always strive to keep the functionality as similar as possible between all transports, because it is difficult to dictate which transport will be used from environment to environment, which is the whole point of the HTTP API anyway.

#27 in reply to: ↑ 26 ; follow-up: @cogmios
13 years ago

Ok. then add them to the other transports to?

Maybe my context first: how can i get the favicon of 'http://su.pr/16IDwy' ?

Answer: I need the URL where it forwards to, to determine the base url

1st possibility: put redirect to 0 and + write code to manually redirect and do manually all the stuff that among others curl does. Did not work in 1/3 of the cases. This is where this ticket came in.

So therefore my 2nd possibility: add additional to the existing "do_action_ref_array( 'http_api_curl', array(&$handle) ); " one extra call to provide the information. In that way I can grab the url and know where it forwards to. I think that is the purpose of that field that curl returns grin. Also in that way users of WordPress can write plugins to e.g. describe where a shortlink heads to before clicking it.

At this moment, since I can not overwrite existing classes and I dont want to include a complete new class including http.php and class-http.php in a new namespace, I am thinking to use http_api_curl, then set the curl parameters, do the requests and send an empty url back and forget about what it returns grin. But that seems not good but seems to be the only way with the current production release.

Last edited 13 years ago by cogmios (previous) (diff)

#28 in reply to: ↑ 27 @sivel
13 years ago

Replying to cogmios:

Ok. then add them to the other transports to?

Feel free to open another ticket. You are hijacking this ticket with something that is different that what we are trying to accomplish here.

#29 @cogmios
13 years ago

Ok... another ticket grin...

1) I jumped in here: http://groups.google.com/group/wp-hackers/browse_thread/thread/19dc7c7197f66754/9a60ad0b37ccaf1b?show_docid=9a60ad0b37ccaf1b&pli=1 based on that one (my other name is edward de leau)

2) Then I started the topic here: http://wordpress.stackexchange.com/questions/11951/wp-remote-get-with-manual-redirect/11964#11964 (edelwater is my other NIC)

3) Whereafter the deadmedic created this ticket.

Both with this case: "to be able to see the url where it forwards to" (which curl provides but is not passed along) <--- workaround: "set redirection to 0" (cant think of another reason why you would want this) <--- curl problem needs another parameter ---> my workaround: provide the url in another way

So line of the ticket "As highlighted in the following discussions, there's an issue when we don't want to follow redirects." ---> I think the only reason why someone does not want to follow redirects (as 1/3 is not possible in the current situation because it gives the 301/302 trap) --> because he needs to see where he is heading to see also the initial topic on http://groups.google.com/group/wp-hackers/browse_thread/thread/19dc7c7197f66754/9a60ad0b37ccaf1b?show_docid=9a60ad0b37ccaf1b&pli=1 , same reason.

But: https://core.trac.wordpress.org/ticket/16950 new ticket reference.

Last edited 13 years ago by cogmios (previous) (diff)

#30 @sivel
13 years ago

I have tested by specifying redirection => 0 and fsockopen seems to be the only one that throws a WP_Error. All other transports return the 302 found, with the headers specifying the location.

#31 @cogmios
13 years ago

update: here : http://ijskast.com/fail.sql are some uri's that fail on my server unfortunately i did not log the transport against it but probably Curl.

on my localhost/127.0.0.1 () more work with Curl.

(so all with redirect=0)

(the difference being that on the server there is something filled in for open basedir) (which is an additional other problem to tackle...)

Last edited 13 years ago by cogmios (previous) (diff)

#32 @sivel
13 years ago

It seems the difference here is between a 301 and a 302 redirect. With 302 redirects the only transport that throws a WP_Error is fsockopen. With 301, fsockopen and cURL will both throw a WP_Error.

#33 @cogmios
13 years ago

I changed the 11305 but to this:

if ( $r['redirection'] > 0 && !empty($theHeaders['headers']['location']) && (ini_get('safe_mode') || ini_get('open_basedir')) )
{
 if ( $r['redirection']-- > 0 )
 {
   return $this->request($theHeaders['headers']['location'], $r);
 }
 else
 {
 return new WP_Error('http_request_failed', __('Too many redirects.'));
 }
}

Otherwise [ ' redirection ' ] == 0 so it will always give this error.

This looks good on the server as the first 750 uri's do not show 'Too many redirects'

For the situation 'safe mode' / 'open base dir' + Curl + ssl verify off: my run on the first 750 (running) :

<url> malformed:	        11
Bad Request:	                5
couldn't connect to host:	4
Empty reply from server:	2
Forbidden:	                5
Not Found:	                62
OK:	                        702

the malformed ones are cause by a before-bug.

Last edited 13 years ago by cogmios (previous) (diff)

#34 @dd32
13 years ago

I've been working up a set of Unit tests for HTTP related to redirects the last few days, and the cases you're finding here are one of many where redirections are not handled a like between transports.

The current output looks like:

http Ext    Streams    cURL      fsockopen
SSSSSSSSS, ..F...F.F, ..FFF.FF., ..F...F..
S = skip; F = Failed test; . = Passed test
Version 0, edited 13 years ago by dd32 (next)

#35 @dd32
13 years ago

  • Resolution set to fixed
  • Status changed from reviewing to closed

(In [17551]) Allow for array('redirection' => 0) to bypass WP_Error on redirects being encountered; Allows HEAD requests WITH 'redirection' > 0 specified at call time to follow redirections; Standardises on return values from all transports to act the same based on the Unit Tests. Fixes #16855

#36 @dd32
13 years ago

You can find the current unit tests for the WP_HTTP class here: http://unit-tests.svn.wordpress.org/wp-testcase/test_http.php

Unit test results before:

......F........FF..F...F........FFF......FF..F...F..

Unit test results after: (Tested with, and without safe_mode)

....................................................

#37 @dd32
13 years ago

  • Milestone changed from Awaiting Review to 3.2

#38 @cogmios
13 years ago

Here is an update with a testset of 31.000 uri's I did on the same config as above:

Error Messages or Blank:

(empty)	873
Empty reply from server	18
<url> malformed	321 (probably my fault)
couldn't connect to host	153
Couldn't resolve host (... with name) : estimate: 25

Error Codes:

0	8 ()
200	18509 (OK)
300	2 (Multiple Choices)
301	5 (Moved Permanently)
304	2 (Not Modified) (huh?)
400	64 (Bad Request)
401	7 (Unauthorized)
403	166 (Forbidden)
404	2106 (Not Found) (various)
410	27 (Gone)
423	2  (Locked) (HUH?)
500	52 (Internal Server Error)
501	2  (Not Implemented)
502	11 (Bad Gateway)
503	8  (Service Unavailable)
504	2  (Gateway Timeout)

So looks good. Have to check some of them for additional issues.

Last edited 13 years ago by cogmios (previous) (diff)
Note: See TracTickets for help on using tickets.