Make WordPress Core

Opened 14 years ago

Closed 9 years ago

#16859 closed defect (bug) (fixed)

esc_url eats square brackets.

Reported by: f00f's profile f00f Owned by: johnbillion's profile johnbillion
Milestone: 4.4 Priority: normal
Severity: major Version: 3.1
Component: Formatting Keywords: has-patch
Focuses: Cc:

Description (last modified by westi)

When adding a link to the blogroll (using wp-admin/link-add.php), square brackets in the link are removed, breaking the link.

Example:

http://lokale-wochenzeitungen.de/index.php?id=485&tx_ttnews[pointer]=6&tx_ttnews[tt_news]=132583&tx_ttnews[backPid]=741&cHash=ee9c87874b

becomes

http://lokale-wochenzeitungen.de/index.php?id=485&tx_ttnewspointer=6&tx_ttnewstt_news=132583&tx_ttnewsbackPid=741&cHash=ee9c87874b

Workaround: Use URL-encoded links (%5B and %5D instead of [ and ]).

This also affects urls which are made clickable by {{{make_clickable}}

Attachments (7)

patch16859.diff (577 bytes) - added by edwardw 13 years ago.
[PATCH] Urlencode brackets when cleaning
patch16859.v2.diff (672 bytes) - added by edwardw 13 years ago.
[PATCH] Urlencode brackets when cleaning, patching wp-admin/bookmark.php instead
square-bracket-esc_url.diff (548 bytes) - added by westi 13 years ago.
Simple fix to just encode square brackets in esc_url - breaks IPv6 urls though.
16859-03.patch (3.5 KB) - added by gcorne 11 years ago.
16859-03.2.patch (3.8 KB) - added by gcorne 11 years ago.
16859-unittests.diff (2.0 KB) - added by MikeHansenMe 10 years ago.
see #30284
16859.diff (7.3 KB) - added by johnbillion 9 years ago.

Download all attachments as: .zip

Change History (50)

#1 follow-up: @ldebrouwer
14 years ago

  • Cc ldebrouwer added
  • Keywords 2nd-opinion added; needs-patch removed

To achieve this we would need to alter esc_url which I'm pretty sure is not going to happen because URLs containing brackets shouldn't be floating around and should always be encoded!

If you take a look at the URL spec ( http://www.w3.org/Addressing/URL/url-spec.txt ) you can read:

"The 'national' and 'punctuation' characters do not appear in any productions and therefore may not appear in URLs.

national { | } | vline | [ | ] | \ | | ~

punctuation < | >"

I don't know if you've copied the URL from somewhere but the brackets definitely shouldn't be there.

#2 in reply to: ↑ 1 @f00f
14 years ago

Replying to ldebrouwer:

To achieve this we would need to alter esc_url which I'm pretty sure is not going to happen because URLs containing brackets shouldn't be floating around and should always be encoded!

Is it possible to wrap a urlencode() around the URL before escaping it? Didn't have a look at the code yet, so don't know if that makes sense.

If you take a look at the URL spec ( http://www.w3.org/Addressing/URL/url-spec.txt ) you can read:

[...]

True (wasn't aware of that, thanks).
However, the PHP FAQ suggests using square brackets for form fields to create arrays ( http://www.php.net/manual/en/faq.html.php#faq.html.arrays ). Also, a similar problem came up in ticket:12690, so I thought it might be interesting.

I don't know if you've copied the URL from somewhere but the brackets definitely shouldn't be there.

The URL is generated by the (IMHO popular) tt_news extension for Typo3.

#3 follow-up: @dd32
14 years ago

However, the PHP FAQ suggests using square brackets for form fields to create arrays

That's correct, however, it doesnt mean that the characters should be unescaped in a GET request.

#4 in reply to: ↑ 3 ; follow-up: @f00f
14 years ago

Replying to dd32:

However, the PHP FAQ suggests using square brackets for form fields to create arrays

That's correct, however, it doesnt mean that the characters should be unescaped in a GET request.

Yah, but apparently it happens when copy-pasting a URL from the address bar or using 'copy link location'.

#5 in reply to: ↑ 4 @ldebrouwer
14 years ago

Replying to f00f:

Yah, but apparently it happens when copy-pasting a URL from the address bar or using 'copy link location'.

And that, to me, seems the root of the problem. You copy a, according to standards, malformed URL from a third-party. And WordPress, as a service, filters all the bad characters from the URL for security reasons. To me WordPress should not anticipate specific behaviour of third-party software ( browsers aside ), just the behaviour of the users.

#6 @f00f
14 years ago

I hear you. Well, hopefully this ticket helps someone experiencing the same problem I did.

#7 @dd32
14 years ago

Just looking at this, [] should definitely be allowed in the url's.. at least in the query segment

#8 @ldebrouwer
14 years ago

No they shouldn't, at least not unencoded. And like I pointed out earlier it's not up to WordPress to fix the problems of third-parties.

#9 @dd32
14 years ago

No they shouldn't, at least not unencoded.

In URL's produced, sure.

However, users should not be expected to encode the values themselves, many links/sites do not encode them themselves, which leads to this case of users encountering issues when pasting the links in.

[] should be allowed to be entered, they shouldn't be striped, perhaps they should be encoded or similar however.

@edwardw
13 years ago

[PATCH] Urlencode brackets when cleaning

#10 @edwardw
13 years ago

  • Keywords has-patch dev-feedback added; 2nd-opinion removed
  • Owner set to edwardw
  • Status changed from new to accepted

Even though brackets should be encoded for use in third-party applications in theory, we should not blame the user who often may not even be aware of this. Often the URL is copied from the location bar or other source where they have not encoded this, and especially due to PHP's use of brackets to pass arrays. I have attached a patch which urlencode()s brackets when cleaning URLs.

#11 @westi
13 years ago

  • Keywords needs-patch added; has-patch dev-feedback removed

I'm not sure we should be patching esc_url for this issue but rather just fix the link add/editing work flow to correctly encode the square brackets.

@edwardw
13 years ago

[PATCH] Urlencode brackets when cleaning, patching wp-admin/bookmark.php instead

#12 @edwardw
13 years ago

  • Keywords has-patch dev-feedback added; needs-patch removed

Agreed.

#13 @westi
13 years ago

  • Description modified (diff)
  • Owner changed from edwardw to westi
  • Summary changed from Square brackets are removed from links in blogroll to esc_url eats square brackets.

I've changed my mind about this as it affects a number of places and I think we do better to patch esc_url to fix this for [] by encoding them for you.

#14 @nacin
13 years ago

Per RFC 1738, brackets are considered "unsafe", and are not reserved, and should therefore always be encoded. http://tools.ietf.org/html/rfc1738#section-2.2

But per RFC3986, they are now a reserved gen-delimiter character, which means it would not be in our best interest to blindly encode all of the ones we find: http://tools.ietf.org/html/rfc3986#section-2.2. It is designed for wrapping an IPv6+ address for the host, according to http://tools.ietf.org/html/rfc3986#section-3.2.2. This is actually okay, as we can pretty easily avoid targeting the host. That it is a reserved character with a specific meaning allows us to make this change in a more future-proof way.

#15 @westi
13 years ago

Simple make_clickable tests in [UT725]

#16 @westi
13 years ago

Simple esc_url tests in [UT726] which include IPv6 url tests.

@westi
13 years ago

Simple fix to just encode square brackets in esc_url - breaks IPv6 urls though.

#17 @nacin
12 years ago

esc_url() can also eat unencoded colons: #21974.

#18 @SergeyBiryukov
12 years ago

  • Component changed from General to Formatting

#19 @SergeyBiryukov
12 years ago

#23519 was marked as a duplicate.

#20 @rcain
12 years ago

i would like to add my voice to calls for a solution here also please.

i want to be able to inject dynamic data such as bp_loggedin_user_domain() into conventional wp menu url's using conventional wp short-code syntax ie. square brackets.

unfortunately, wp_update_nav_menu_item() calls esc_url_raw($argsmenu-item-url? directly - no filters available and subsequently esc_url_raw() uses hard-coded pregmatch string, also without filter around it. thus, it is impossible to 'filter' this data at all, without hacking core - which i refuse to do.

please can we have some additional filter hooks defined/implemented? either around/inside call to esc_ulr, esc_url_raw or just around the pregmatch string(s) itself/themselves, inside esc_url.

this would help me and many others greatly.

#21 @SergeyBiryukov
11 years ago

[24480] made this more prominent: #24663.

#22 @ocean90
11 years ago

#24663 was marked as a duplicate.

#23 @nacin
11 years ago

  • Milestone changed from Awaiting Review to 3.6
  • Severity changed from minor to major

Marking this as 3.6 and major due to [24480].

The patch seems fine but I think we should see if we can avoid hosts.

#24 @nacin
11 years ago

  • Milestone changed from 3.6 to Future Release

Never mind, punting this. Leaving #24663 open to fix this for wp_http_validate_url() for 3.6.

#25 @jeremyfelt
11 years ago

#25302 was marked as a duplicate.

@gcorne
11 years ago

@gcorne
11 years ago

#27 @gcorne
11 years ago

I spent some time looking into this issue as well as #15936. When sanitizing, validating, and escaping URLs, it seems that the most robust solution is to break the url into its components, sanitize, and then rebuild. 16859-03.2.patch does this by leveraging parse_url and then reconstructing the url after sanitizing by following the psuedo code in RFC3986. By breaking the url into its components, we can also easily add other rules. The solution addresses issues with IPv6 literals by allowing [ and ] in the host component and encodes brackets in the path, query, and fragment segments. It feels a little funny doing this encoding here because it seems to me that the url encoding is something that should be happening elsewhere, but since right now the brackets do not function as delimiters outside the host, i think it is okay. All existing tests pass with this solution.

#28 @nacin
11 years ago

This looks like it could be quite a bit slower. Any data on that?

One thing we've thought about is having a "common situation" regex that only allows very basic characters and declares the URL as safe if it matches. Then do the more rigorous, slower checks if it doesn't. It'll slightly slow down the more complex URLs but will speed up the rest of them. The ticket for this is #22951. If the work here significantly slows down this function, it should probably be considered hand-in-hand with #22951.

#29 @ocean90
11 years ago

#25567 was marked as a duplicate.

#30 follow-up: @mordauk
10 years ago

Looks like wp_sanitize_redirect() also eats square brackets: https://core.trac.wordpress.org/browser/tags/3.9.1/src/wp-includes/pluggable.php#L1135

Input:

wp_sanitize_redirect( 'http://localhost/wordpress/wp-admin/admin.php?var[]=test&var[]=test2' ):

Expected output:

http://localhost/wordpress/wp-admin/admin.php?var[]=test&var[]=test2

Actual output:

http://localhost/wordpress/wp-admin/admin.php?var=test&var=test2

#31 in reply to: ↑ 30 @SergeyBiryukov
10 years ago

Replying to mordauk:

Looks like wp_sanitize_redirect() also eats square brackets

See #17052.

#32 @dd32
10 years ago

#30869 was marked as a duplicate.

#33 @Marie-Aude
9 years ago

Just an additionnal information, pushing for a solution, this generates also problems with WooCommerce external _product_url url that uses esc_url for sanitization

Unfortunately, Zanox (one of the largest affiliation portal in Europe) uses opening and closing brackets in its affiliate urls.

#34 @johnbillion
9 years ago

  • Milestone changed from Future Release to 4.4
  • Owner changed from westi to johnbillion

#35 @johnbillion
9 years ago

In 33851:

Add some more passing unit tests for esc_url() in preparation for upcoming changes.

See #23605, #20771, #16859

#36 @johnbillion
9 years ago

In 33855:

More unit tests for esc_url() and esc_url_raw().

See #23605, #20771, #16859

@johnbillion
9 years ago

#37 @johnbillion
9 years ago

  • Keywords dev-feedback removed

I was concerned about the performance of all the preg_replace() calls in 16859-03.2.patch so I ran a bunch of benchmarks on 10,000 iterations of esc_url(). It increases the processing time by around 40%. Critically, it doesn't pass a bunch of unit tests that I've just added.

16859.diff tackles this slightly differently. Instead of parsing the URL into segments, individually escaping each segment, and then re-constructing the URL, it splits out the "front" of the URL (everything up to the domain and port), and only escapes square brackets in the remainder of the URL. It performs a fast strpos() comparison to see if the URL contains a square bracket, which adds almost no overhead to URLs with no square brackets, and an overhead of around 15% to URLs that do.

I'm going to commit this and let the Travis tests run to see if anything else breaks.

#38 @johnbillion
9 years ago

In 34674:

Avoid stripping square brackets from URLs, and instead correctly encode them.

Square brackets must be encoded in the path, path parameters, query parameters, and fragment, but must not be encoded in anything up to the domain and port.

Adds a bunch of tests, including square brackets in query parameters, IPv6 URLs, and several other permutations.

See #16859

#39 @johnbillion
9 years ago

In 34675:

Revert r34674 due to failures on PHP < 5.4.

See #16859

#40 @johnbillion
9 years ago

  • Keywords needs-patch added; has-patch removed
  • Milestone changed from 4.4 to Future Release

The issue we have here is that PHP <= 5.4.7 doesn't correctly handle IPv6 hosts in parse_url(). Ref: https://3v4l.org/nG3QA

Not sure where to go from here.

#41 @dd32
9 years ago

One method could be to finally move WP_HTTP::parse_url() to wp_parse_url() and use that instead.

https://3v4l.org/lWQ4p is a version of that which includes half of a fix for an edge-case (If the input path is protocolless and doesn't contain a path component, then WP_HTTP::parse_url() will report / as the path)

#42 @johnbillion
9 years ago

  • Keywords has-patch added; needs-patch removed
  • Milestone changed from Future Release to 4.4

So here's a thing. Square brackets are stripped out when using esc_url(). This means URLs with square brackets in the path or query are broken, and IPv6 literal URLs are broken.

[34675] attempted to fix both issues at once, but the tests failed due to parse_url() not being able to handle IPv6 literal URLs in PHP < 5.4.7. However the original intention was never to fix IPv6 literals - it was a side effect.

To that end, the IPv6 tests can be removed, and we end up with:

  1. Square brackets correctly handled in paths and queries.
  2. IPv6 literal URLs consequentially fixed in PHP > 5.4.7.

There is no down-side to reimplementing [34674], minus its IPv6 tests.

#43 @johnbillion
9 years ago

  • Resolution set to fixed
  • Status changed from accepted to closed

In 34920:

Avoid stripping square brackets from URLs, and instead correctly encode them. Square brackets must be encoded in the path, path parameters, query parameters, and fragment, but must not be encoded in anything up to the domain and port.

Adds tests.

Fixes #16859

Note: See TracTickets for help on using tickets.