Make WordPress Core

Opened 13 years ago

Closed 12 years ago

Last modified 12 years ago

#19922 closed defect (bug) (fixed)

Cookie urlencoding in getHeaderValue method of WP_Http_Cookie confuses servers

Reported by: pw201's profile pw201 Owned by: westi's profile westi
Milestone: 3.4 Priority: normal
Severity: normal Version: 2.8
Component: HTTP API Keywords: has-patch dev-feedback
Focuses: Cc:

Description

WP_Http_Cookie calls urlencode on cookie values before they're used in the Cookie header in the HTTP request. This produces interoperability problems with servers which don't perform the corresponding decode. This Stack Overflow article says that the RFC specifying that these values should be encoded is not well adopted, and that browsers don't follow it.

I found this while looking into why the LiveJournal importer now fails to import comments. I think LJ changed their cookie formats a while ago. The session cookies now contain colons and forward slashes. Both of these are encoded by the WP core code, resulting in the cookie not being recognised by LJ's server.

Removing the call to urlencode in getHeaderValue allows the import to complete. A better fix would probably be to only encode non-printable characters, I suppose.

Attachments (5)

19922.patch (1.2 KB) - added by kurtpayne 13 years ago.
Encoding only non-printable ASCII
19922.2.patch (466 bytes) - added by dllh 13 years ago.
apply_filters to let plugin authors handle implementation
19922.3.patch (1.7 KB) - added by kurtpayne 13 years ago.
Combining cookie spec and apply_filters approach from dllh
19922.without-filter.diff (368 bytes) - added by westi 12 years ago.
Simple patch to remove the encoding
19922.with-filter.diff (421 bytes) - added by westi 12 years ago.
Simple patch to remove the encoding but allow for a plugin to encode it if needed using a filter

Download all attachments as: .zip

Change History (23)

#2 @pw201
13 years ago

Clarification of the problem: LJ's protocol gives you an authentication code as a string over their XML-RPC protocol, and you then send it back as a cookie when you retrieve the comments using straightforward HTTP GETs on a special URL. This string contains ":" and "/". The PHP urlencode function decides to encode those. LJ's server doesn't choose to interpret cookies as urlencoded. Failure ensues.

@kurtpayne
13 years ago

Encoding only non-printable ASCII

#3 @kurtpayne
13 years ago

  • Cc kpayne@… added
  • Component changed from General to HTTP
  • Keywords has-patch dev-feedback added
  • Version set to 2.8

@dllh
13 years ago

apply_filters to let plugin authors handle implementation

#4 @dllh
13 years ago

  • Cc daryl@… added

I was just looking at this exact issue (LJ and all). The solution I arrived at was to just apply a filter so that plugin developers (or whoever) can adjust headers as needed to suit their particular purposes. It's a smaller change to core but may also just sidestep the underlying problem. As the cookie spec neither defines nor requires a particular encoding, I sort of hate to enforce a particular encoding within core, at least without allowing other code to modify it before using the cookie.

#5 follow-up: @dd32
13 years ago

The cookie spec indeed doesn't have any standard, other than only US-ASCII characters are permitted (And other sites warn non-ascii characters are prone to being mangled), URL Encoding non-ASCII characters seems to be the agree'd standard, encoding : and \ doesn't seem to make sense to me, as they're printable US-ASCII characters, so php's urlencode shouldn't be used here.

However, the Data contained within cookies is only useful to the origin server, We shouldn't try to interpret it, as we have no need to, we simply need to send back the same data that we received, Plugin authors can pass any kind of data in, and it should be sent as-is. Or at least, that's my understanding of how we should be handling it.

refs: rfc 2109 rfc 2965

#6 @nacin
13 years ago

However, the Data contained within cookies is only useful to the origin server, We shouldn't try to interpret it, as we have no need to, we simply need to send back the same data that we received, Plugin authors can pass any kind of data in, and it should be sent as-is. Or at least, that's my understanding of how we should be handling it.

I would agree with that.

#7 @westi
13 years ago

  • Owner set to westi
  • Status changed from new to accepted

#8 in reply to: ↑ 5 @kurtpayne
13 years ago

Replying to dd32:

The cookie spec indeed doesn't have any standard, other than only US-ASCII characters are permitted

RFC 6265 identifies a cookie value as:

 cookie-value      = *cookie-octet / ( DQUOTE *cookie-octet DQUOTE )
 cookie-octet      = %x21 / %x23-2B / %x2D-3A / %x3C-5B / %x5D-7E
                       ; US-ASCII characters excluding CTLs,
                       ; whitespace DQUOTE, comma, semicolon,
                       ; and backslash

Perhaps these should be encoded, at a minimum? As a measure against double encoding, 19922.patch also encodes % and +. Not sure why = was in there.

@kurtpayne
13 years ago

Combining cookie spec and apply_filters approach from dllh

#9 @pw201
13 years ago

The right thing to do with current servers and browsers (which as the Stack Overflow guy says, don't follow the RFCs) is to leave the data alone as much as you can: that means the urldecode in the constructor for WP_Http_Cookie and the urlencode on the output side should go, to my mind. If someone manually constructs a cookie with illegal characters in, you get to decide what you do with those, but that's it (and in those cases, what some browsers do is use Unicode encodings: nobody does urlencoding).

Things which currently break in the typical situation where the server sets cookies with the Set-Cookie header:

  • if a server sets a cookie like "1234:/", the urldecode will leave that alone, and the encode will mess with the colons and slashes, so you'll end up sending the server something different from what it gave you. (This one also breaks when you build the cookie by hand with data from another source, which is what happens in the LJ API case).
  • if a server sets a cookie like "%41", the urldecode will turn that into "A", and the encode will leave it alone, again meaning you send the server something other than what it gave you.
Last edited 13 years ago by pw201 (previous) (diff)

#10 @yoavf
13 years ago

  • Cc yoav@… added

#11 @westi
12 years ago

I think the best solution here is the one dd32 outlined.

Just send what we are given and ignore any RFC because no one really follows it.

@westi
12 years ago

Simple patch to remove the encoding

@westi
12 years ago

Simple patch to remove the encoding but allow for a plugin to encode it if needed using a filter

#12 @westi
12 years ago

Two variants of the simple patch we need with/without a filter to allow a plugin to encode cookies if really needed as a fallback.

#13 @dd32
12 years ago

Both patches look right to me, Including a filter seems to be the best way forward to avoid breaking an implementation and providing no way to fix it.

The only thing I can really see that would break, is someone passing a \n in, which would've been the wrong thing to do in the first place.

#14 @dllh
12 years ago

I confirm that the patch with filter resolves the LJ import issue. Tested against trunk.

#15 @westi
12 years ago

  • Milestone changed from Awaiting Review to 3.4

#16 @SergeyBiryukov
12 years ago

There's a typo in 19922.with-filter.diff: $this-name.

#17 @westi
12 years ago

  • Resolution set to fixed
  • Status changed from accepted to closed

In [20372]:

HTTP: Don't force cookie values to be urlencoded because it breaks usage of cookies in some scenarios like the LiveJournal Importer. Instead add a filter for a plugin to use if it really wants to have the cookie mangled. Fixes #19922 props pw201, dllh and kurtpayne.

#18 @westi
12 years ago

In [20373]:

Fix typo in [20372] - See #19922 props SergeyBiryukov.

Note: See TracTickets for help on using tickets.