Opened 14 years ago
Closed 9 years ago
#16859 closed defect (bug) (fixed)
esc_url eats square brackets.
Reported by: | f00f | Owned by: | johnbillion |
---|---|---|---|
Milestone: | 4.4 | Priority: | normal |
Severity: | major | Version: | 3.1 |
Component: | Formatting | Keywords: | has-patch |
Focuses: | Cc: |
Description (last modified by )
When adding a link to the blogroll (using wp-admin/link-add.php), square brackets in the link are removed, breaking the link.
Example:
http://lokale-wochenzeitungen.de/index.php?id=485&tx_ttnews[pointer]=6&tx_ttnews[tt_news]=132583&tx_ttnews[backPid]=741&cHash=ee9c87874b
becomes
http://lokale-wochenzeitungen.de/index.php?id=485&tx_ttnewspointer=6&tx_ttnewstt_news=132583&tx_ttnewsbackPid=741&cHash=ee9c87874b
Workaround: Use URL-encoded links (%5B and %5D instead of [ and ]).
This also affects urls which are made clickable by {{{make_clickable}}
Attachments (7)
Change History (50)
#1
follow-up:
↓ 2
@
14 years ago
- Cc ldebrouwer added
- Keywords 2nd-opinion added; needs-patch removed
#2
in reply to:
↑ 1
@
14 years ago
Replying to ldebrouwer:
To achieve this we would need to alter esc_url which I'm pretty sure is not going to happen because URLs containing brackets shouldn't be floating around and should always be encoded!
Is it possible to wrap a urlencode() around the URL before escaping it? Didn't have a look at the code yet, so don't know if that makes sense.
If you take a look at the URL spec ( http://www.w3.org/Addressing/URL/url-spec.txt ) you can read:
[...]
True (wasn't aware of that, thanks).
However, the PHP FAQ suggests using square brackets for form fields to create arrays ( http://www.php.net/manual/en/faq.html.php#faq.html.arrays ). Also, a similar problem came up in ticket:12690, so I thought it might be interesting.
I don't know if you've copied the URL from somewhere but the brackets definitely shouldn't be there.
The URL is generated by the (IMHO popular) tt_news extension for Typo3.
#3
follow-up:
↓ 4
@
14 years ago
However, the PHP FAQ suggests using square brackets for form fields to create arrays
That's correct, however, it doesnt mean that the characters should be unescaped in a GET request.
#4
in reply to:
↑ 3
;
follow-up:
↓ 5
@
14 years ago
Replying to dd32:
However, the PHP FAQ suggests using square brackets for form fields to create arrays
That's correct, however, it doesnt mean that the characters should be unescaped in a GET request.
Yah, but apparently it happens when copy-pasting a URL from the address bar or using 'copy link location'.
#5
in reply to:
↑ 4
@
14 years ago
Replying to f00f:
Yah, but apparently it happens when copy-pasting a URL from the address bar or using 'copy link location'.
And that, to me, seems the root of the problem. You copy a, according to standards, malformed URL from a third-party. And WordPress, as a service, filters all the bad characters from the URL for security reasons. To me WordPress should not anticipate specific behaviour of third-party software ( browsers aside ), just the behaviour of the users.
#6
@
14 years ago
I hear you. Well, hopefully this ticket helps someone experiencing the same problem I did.
#7
@
14 years ago
Just looking at this, [] should definitely be allowed in the url's.. at least in the query segment
#8
@
14 years ago
No they shouldn't, at least not unencoded. And like I pointed out earlier it's not up to WordPress to fix the problems of third-parties.
#9
@
14 years ago
No they shouldn't, at least not unencoded.
In URL's produced, sure.
However, users should not be expected to encode the values themselves, many links/sites do not encode them themselves, which leads to this case of users encountering issues when pasting the links in.
[] should be allowed to be entered, they shouldn't be striped, perhaps they should be encoded or similar however.
#10
@
13 years ago
- Keywords has-patch dev-feedback added; 2nd-opinion removed
- Owner set to edwardw
- Status changed from new to accepted
Even though brackets should be encoded for use in third-party applications in theory, we should not blame the user who often may not even be aware of this. Often the URL is copied from the location bar or other source where they have not encoded this, and especially due to PHP's use of brackets to pass arrays. I have attached a patch which urlencode()s brackets when cleaning URLs.
#11
@
13 years ago
- Keywords needs-patch added; has-patch dev-feedback removed
I'm not sure we should be patching esc_url
for this issue but rather just fix the link add/editing work flow to correctly encode the square brackets.
#13
@
13 years ago
- Description modified (diff)
- Owner changed from edwardw to westi
- Summary changed from Square brackets are removed from links in blogroll to esc_url eats square brackets.
I've changed my mind about this as it affects a number of places and I think we do better to patch esc_url to fix this for [] by encoding them for you.
#14
@
13 years ago
Per RFC 1738, brackets are considered "unsafe", and are not reserved, and should therefore always be encoded. http://tools.ietf.org/html/rfc1738#section-2.2
But per RFC3986, they are now a reserved gen-delimiter character, which means it would not be in our best interest to blindly encode all of the ones we find: http://tools.ietf.org/html/rfc3986#section-2.2. It is designed for wrapping an IPv6+ address for the host, according to http://tools.ietf.org/html/rfc3986#section-3.2.2. This is actually okay, as we can pretty easily avoid targeting the host. That it is a reserved character with a specific meaning allows us to make this change in a more future-proof way.
#20
@
12 years ago
i would like to add my voice to calls for a solution here also please.
i want to be able to inject dynamic data such as bp_loggedin_user_domain() into conventional wp menu url's using conventional wp short-code syntax ie. square brackets.
unfortunately, wp_update_nav_menu_item() calls esc_url_raw($argsmenu-item-url? directly - no filters available and subsequently esc_url_raw() uses hard-coded pregmatch string, also without filter around it. thus, it is impossible to 'filter' this data at all, without hacking core - which i refuse to do.
please can we have some additional filter hooks defined/implemented? either around/inside call to esc_ulr, esc_url_raw or just around the pregmatch string(s) itself/themselves, inside esc_url.
this would help me and many others greatly.
#23
@
11 years ago
- Milestone changed from Awaiting Review to 3.6
- Severity changed from minor to major
Marking this as 3.6 and major due to [24480].
The patch seems fine but I think we should see if we can avoid hosts.
#24
@
11 years ago
- Milestone changed from 3.6 to Future Release
Never mind, punting this. Leaving #24663 open to fix this for wp_http_validate_url() for 3.6.
#27
@
11 years ago
I spent some time looking into this issue as well as #15936. When sanitizing, validating, and escaping URLs, it seems that the most robust solution is to break the url into its components, sanitize, and then rebuild. 16859-03.2.patch does this by leveraging parse_url
and then reconstructing the url after sanitizing by following the psuedo code in RFC3986. By breaking the url into its components, we can also easily add other rules. The solution addresses issues with IPv6 literals by allowing [
and ]
in the host component and encodes brackets in the path
, query
, and fragment
segments. It feels a little funny doing this encoding here because it seems to me that the url encoding is something that should be happening elsewhere, but since right now the brackets do not function as delimiters outside the host, i think it is okay. All existing tests pass with this solution.
#28
@
11 years ago
This looks like it could be quite a bit slower. Any data on that?
One thing we've thought about is having a "common situation" regex that only allows very basic characters and declares the URL as safe if it matches. Then do the more rigorous, slower checks if it doesn't. It'll slightly slow down the more complex URLs but will speed up the rest of them. The ticket for this is #22951. If the work here significantly slows down this function, it should probably be considered hand-in-hand with #22951.
#30
follow-up:
↓ 31
@
10 years ago
Looks like wp_sanitize_redirect()
also eats square brackets: https://core.trac.wordpress.org/browser/tags/3.9.1/src/wp-includes/pluggable.php#L1135
Input:
wp_sanitize_redirect( 'http://localhost/wordpress/wp-admin/admin.php?var[]=test&var[]=test2' )
:
Expected output:
http://localhost/wordpress/wp-admin/admin.php?var[]=test&var[]=test2
Actual output:
http://localhost/wordpress/wp-admin/admin.php?var=test&var=test2
#33
@
9 years ago
Just an additionnal information, pushing for a solution, this generates also problems with WooCommerce external _product_url url that uses esc_url for sanitization
Unfortunately, Zanox (one of the largest affiliation portal in Europe) uses opening and closing brackets in its affiliate urls.
#34
@
9 years ago
- Milestone changed from Future Release to 4.4
- Owner changed from westi to johnbillion
#37
@
9 years ago
- Keywords dev-feedback removed
I was concerned about the performance of all the preg_replace()
calls in 16859-03.2.patch so I ran a bunch of benchmarks on 10,000 iterations of esc_url()
. It increases the processing time by around 40%. Critically, it doesn't pass a bunch of unit tests that I've just added.
16859.diff tackles this slightly differently. Instead of parsing the URL into segments, individually escaping each segment, and then re-constructing the URL, it splits out the "front" of the URL (everything up to the domain and port), and only escapes square brackets in the remainder of the URL. It performs a fast strpos()
comparison to see if the URL contains a square bracket, which adds almost no overhead to URLs with no square brackets, and an overhead of around 15% to URLs that do.
I'm going to commit this and let the Travis tests run to see if anything else breaks.
#40
@
9 years ago
- Keywords needs-patch added; has-patch removed
- Milestone changed from 4.4 to Future Release
The issue we have here is that PHP <= 5.4.7 doesn't correctly handle IPv6 hosts in parse_url()
. Ref: https://3v4l.org/nG3QA
Not sure where to go from here.
#41
@
9 years ago
One method could be to finally move WP_HTTP::parse_url()
to wp_parse_url()
and use that instead.
https://3v4l.org/lWQ4p is a version of that which includes half of a fix for an edge-case (If the input path is protocolless and doesn't contain a path component, then WP_HTTP::parse_url()
will report /
as the path)
#42
@
9 years ago
- Keywords has-patch added; needs-patch removed
- Milestone changed from Future Release to 4.4
So here's a thing. Square brackets are stripped out when using esc_url()
. This means URLs with square brackets in the path or query are broken, and IPv6 literal URLs are broken.
[34675] attempted to fix both issues at once, but the tests failed due to parse_url()
not being able to handle IPv6 literal URLs in PHP < 5.4.7. However the original intention was never to fix IPv6 literals - it was a side effect.
To that end, the IPv6 tests can be removed, and we end up with:
- Square brackets correctly handled in paths and queries.
- IPv6 literal URLs consequentially fixed in PHP > 5.4.7.
There is no down-side to reimplementing [34674], minus its IPv6 tests.
To achieve this we would need to alter esc_url which I'm pretty sure is not going to happen because URLs containing brackets shouldn't be floating around and should always be encoded!
If you take a look at the URL spec ( http://www.w3.org/Addressing/URL/url-spec.txt ) you can read:
"The 'national' and 'punctuation' characters do not appear in any productions and therefore may not appear in URLs.
national { | } | vline | [ | ] | \ | | ~
punctuation < | >"
I don't know if you've copied the URL from somewhere but the brackets definitely shouldn't be there.