#19922 closed defect (bug) (fixed)
Cookie urlencoding in getHeaderValue method of WP_Http_Cookie confuses servers
Reported by: | pw201 | Owned by: | westi |
---|---|---|---|
Milestone: | 3.4 | Priority: | normal |
Severity: | normal | Version: | 2.8 |
Component: | HTTP API | Keywords: | has-patch dev-feedback |
Focuses: | Cc: |
Description
WP_Http_Cookie calls urlencode on cookie values before they're used in the Cookie header in the HTTP request. This produces interoperability problems with servers which don't perform the corresponding decode. This Stack Overflow article says that the RFC specifying that these values should be encoded is not well adopted, and that browsers don't follow it.
I found this while looking into why the LiveJournal importer now fails to import comments. I think LJ changed their cookie formats a while ago. The session cookies now contain colons and forward slashes. Both of these are encoded by the WP core code, resulting in the cookie not being recognised by LJ's server.
Removing the call to urlencode in getHeaderValue allows the import to complete. A better fix would probably be to only encode non-printable characters, I suppose.
Attachments (5)
Change History (23)
#2
@
13 years ago
Clarification of the problem: LJ's protocol gives you an authentication code as a string over their XML-RPC protocol, and you then send it back as a cookie when you retrieve the comments using straightforward HTTP GETs on a special URL. This string contains ":" and "/". The PHP urlencode function decides to encode those. LJ's server doesn't choose to interpret cookies as urlencoded. Failure ensues.
#3
@
13 years ago
- Cc kpayne@… added
- Component changed from General to HTTP
- Keywords has-patch dev-feedback added
- Version set to 2.8
#4
@
13 years ago
- Cc daryl@… added
I was just looking at this exact issue (LJ and all). The solution I arrived at was to just apply a filter so that plugin developers (or whoever) can adjust headers as needed to suit their particular purposes. It's a smaller change to core but may also just sidestep the underlying problem. As the cookie spec neither defines nor requires a particular encoding, I sort of hate to enforce a particular encoding within core, at least without allowing other code to modify it before using the cookie.
#5
follow-up:
↓ 8
@
13 years ago
The cookie spec indeed doesn't have any standard, other than only US-ASCII characters are permitted (And other sites warn non-ascii characters are prone to being mangled), URL Encoding non-ASCII characters seems to be the agree'd standard, encoding : and \ doesn't seem to make sense to me, as they're printable US-ASCII characters, so php's urlencode shouldn't be used here.
However, the Data contained within cookies is only useful to the origin server, We shouldn't try to interpret it, as we have no need to, we simply need to send back the same data that we received, Plugin authors can pass any kind of data in, and it should be sent as-is. Or at least, that's my understanding of how we should be handling it.
#6
@
13 years ago
However, the Data contained within cookies is only useful to the origin server, We shouldn't try to interpret it, as we have no need to, we simply need to send back the same data that we received, Plugin authors can pass any kind of data in, and it should be sent as-is. Or at least, that's my understanding of how we should be handling it.
I would agree with that.
#8
in reply to:
↑ 5
@
13 years ago
Replying to dd32:
The cookie spec indeed doesn't have any standard, other than only US-ASCII characters are permitted
RFC 6265 identifies a cookie value as:
cookie-value = *cookie-octet / ( DQUOTE *cookie-octet DQUOTE ) cookie-octet = %x21 / %x23-2B / %x2D-3A / %x3C-5B / %x5D-7E ; US-ASCII characters excluding CTLs, ; whitespace DQUOTE, comma, semicolon, ; and backslash
Perhaps these should be encoded, at a minimum? As a measure against double encoding, 19922.patch also encodes % and +. Not sure why = was in there.
#9
@
13 years ago
The right thing to do with current servers and browsers (which as the Stack Overflow guy says, don't follow the RFCs) is to leave the data alone as much as you can: that means the urldecode in the constructor for WP_Http_Cookie and the urlencode on the output side should go, to my mind. If someone manually constructs a cookie with illegal characters in, you get to decide what you do with those, but that's it (and in those cases, what some browsers do is use Unicode encodings: nobody does urlencoding).
Things which currently break in the typical situation where the server sets cookies with the Set-Cookie header:
- if a server sets a cookie like "1234:/", the urldecode will leave that alone, and the encode will mess with the colons and slashes, so you'll end up sending the server something different from what it gave you. (This one also breaks when you build the cookie by hand with data from another source, which is what happens in the LJ API case).
- if a server sets a cookie like "%41", the urldecode will turn that into "A", and the encode will leave it alone, again meaning you send the server something other than what it gave you.
#11
@
12 years ago
I think the best solution here is the one dd32 outlined.
Just send what we are given and ignore any RFC because no one really follows it.
@
12 years ago
Simple patch to remove the encoding but allow for a plugin to encode it if needed using a filter
#12
@
12 years ago
Two variants of the simple patch we need with/without a filter to allow a plugin to encode cookies if really needed as a fallback.
#13
@
12 years ago
Both patches look right to me, Including a filter seems to be the best way forward to avoid breaking an implementation and providing no way to fix it.
The only thing I can really see that would break, is someone passing a \n in, which would've been the wrong thing to do in the first place.
#14
@
12 years ago
I confirm that the patch with filter resolves the LJ import issue. Tested against trunk.
#16
@
12 years ago
There's a typo in 19922.with-filter.diff: $this-name
.
Related: [10512], #9049