WordPress.org

Make WordPress Core

Opened 3 years ago

Last modified 8 weeks ago

#16859 accepted defect (bug)

esc_url eats square brackets.

Reported by: f00f Owned by: westi
Milestone: Future Release Priority: normal
Severity: major Version: 3.1
Component: Formatting Keywords: has-patch dev-feedback
Focuses: Cc:

Description (last modified by westi)

When adding a link to the blogroll (using wp-admin/link-add.php), square brackets in the link are removed, breaking the link.

Example:

http://lokale-wochenzeitungen.de/index.php?id=485&tx_ttnews[pointer]=6&tx_ttnews[tt_news]=132583&tx_ttnews[backPid]=741&cHash=ee9c87874b

becomes

http://lokale-wochenzeitungen.de/index.php?id=485&tx_ttnewspointer=6&tx_ttnewstt_news=132583&tx_ttnewsbackPid=741&cHash=ee9c87874b

Workaround: Use URL-encoded links (%5B and %5D instead of [ and ]).

This also affects urls which are made clickable by {{{make_clickable}}

Attachments (5)

patch16859.diff (577 bytes) - added by edwardw 3 years ago.
[PATCH] Urlencode brackets when cleaning
patch16859.v2.diff (672 bytes) - added by edwardw 3 years ago.
[PATCH] Urlencode brackets when cleaning, patching wp-admin/bookmark.php instead
square-bracket-esc_url.diff (548 bytes) - added by westi 23 months ago.
Simple fix to just encode square brackets in esc_url - breaks IPv6 urls though.
16859-03.patch (3.5 KB) - added by gcorne 2 months ago.
16859-03.2.patch (3.8 KB) - added by gcorne 2 months ago.

Download all attachments as: .zip

Change History (34)

comment:1 follow-up: ldebrouwer3 years ago

  • Cc ldebrouwer added
  • Keywords 2nd-opinion added; needs-patch removed

To achieve this we would need to alter esc_url which I'm pretty sure is not going to happen because URLs containing brackets shouldn't be floating around and should always be encoded!

If you take a look at the URL spec ( http://www.w3.org/Addressing/URL/url-spec.txt ) you can read:

"The 'national' and 'punctuation' characters do not appear in any productions and therefore may not appear in URLs.

national { | } | vline | [ | ] | \ | | ~

punctuation < | >"

I don't know if you've copied the URL from somewhere but the brackets definitely shouldn't be there.

comment:2 in reply to: ↑ 1 f00f3 years ago

Replying to ldebrouwer:

To achieve this we would need to alter esc_url which I'm pretty sure is not going to happen because URLs containing brackets shouldn't be floating around and should always be encoded!

Is it possible to wrap a urlencode() around the URL before escaping it? Didn't have a look at the code yet, so don't know if that makes sense.

If you take a look at the URL spec ( http://www.w3.org/Addressing/URL/url-spec.txt ) you can read:

[...]

True (wasn't aware of that, thanks).
However, the PHP FAQ suggests using square brackets for form fields to create arrays ( http://www.php.net/manual/en/faq.html.php#faq.html.arrays ). Also, a similar problem came up in ticket:12690, so I thought it might be interesting.

I don't know if you've copied the URL from somewhere but the brackets definitely shouldn't be there.

The URL is generated by the (IMHO popular) tt_news extension for Typo3.

comment:3 follow-up: dd323 years ago

However, the PHP FAQ suggests using square brackets for form fields to create arrays

That's correct, however, it doesnt mean that the characters should be unescaped in a GET request.

comment:4 in reply to: ↑ 3 ; follow-up: f00f3 years ago

Replying to dd32:

However, the PHP FAQ suggests using square brackets for form fields to create arrays

That's correct, however, it doesnt mean that the characters should be unescaped in a GET request.

Yah, but apparently it happens when copy-pasting a URL from the address bar or using 'copy link location'.

comment:5 in reply to: ↑ 4 ldebrouwer3 years ago

Replying to f00f:

Yah, but apparently it happens when copy-pasting a URL from the address bar or using 'copy link location'.

And that, to me, seems the root of the problem. You copy a, according to standards, malformed URL from a third-party. And WordPress, as a service, filters all the bad characters from the URL for security reasons. To me WordPress should not anticipate specific behaviour of third-party software ( browsers aside ), just the behaviour of the users.

comment:6 f00f3 years ago

I hear you. Well, hopefully this ticket helps someone experiencing the same problem I did.

comment:7 dd323 years ago

Just looking at this, [] should definitely be allowed in the url's.. at least in the query segment

comment:8 ldebrouwer3 years ago

No they shouldn't, at least not unencoded. And like I pointed out earlier it's not up to WordPress to fix the problems of third-parties.

comment:9 dd323 years ago

No they shouldn't, at least not unencoded.

In URL's produced, sure.

However, users should not be expected to encode the values themselves, many links/sites do not encode them themselves, which leads to this case of users encountering issues when pasting the links in.

[] should be allowed to be entered, they shouldn't be striped, perhaps they should be encoded or similar however.

edwardw3 years ago

[PATCH] Urlencode brackets when cleaning

comment:10 edwardw3 years ago

  • Keywords has-patch dev-feedback added; 2nd-opinion removed
  • Owner set to edwardw
  • Status changed from new to accepted

Even though brackets should be encoded for use in third-party applications in theory, we should not blame the user who often may not even be aware of this. Often the URL is copied from the location bar or other source where they have not encoded this, and especially due to PHP's use of brackets to pass arrays. I have attached a patch which urlencode()s brackets when cleaning URLs.

comment:11 westi3 years ago

  • Keywords needs-patch added; has-patch dev-feedback removed

I'm not sure we should be patching esc_url for this issue but rather just fix the link add/editing work flow to correctly encode the square brackets.

edwardw3 years ago

[PATCH] Urlencode brackets when cleaning, patching wp-admin/bookmark.php instead

comment:12 edwardw3 years ago

  • Keywords has-patch dev-feedback added; needs-patch removed

Agreed.

comment:13 westi23 months ago

  • Description modified (diff)
  • Owner changed from edwardw to westi
  • Summary changed from Square brackets are removed from links in blogroll to esc_url eats square brackets.

I've changed my mind about this as it affects a number of places and I think we do better to patch esc_url to fix this for [] by encoding them for you.

comment:14 nacin23 months ago

Per RFC 1738, brackets are considered "unsafe", and are not reserved, and should therefore always be encoded. http://tools.ietf.org/html/rfc1738#section-2.2

But per RFC3986, they are now a reserved gen-delimiter character, which means it would not be in our best interest to blindly encode all of the ones we find: http://tools.ietf.org/html/rfc3986#section-2.2. It is designed for wrapping an IPv6+ address for the host, according to http://tools.ietf.org/html/rfc3986#section-3.2.2. This is actually okay, as we can pretty easily avoid targeting the host. That it is a reserved character with a specific meaning allows us to make this change in a more future-proof way.

comment:15 westi23 months ago

Simple make_clickable tests in [UT725]

comment:16 westi23 months ago

Simple esc_url tests in [UT726] which include IPv6 url tests.

westi23 months ago

Simple fix to just encode square brackets in esc_url - breaks IPv6 urls though.

comment:17 nacin18 months ago

esc_url() can also eat unencoded colons: #21974.

comment:18 SergeyBiryukov15 months ago

  • Component changed from General to Formatting

comment:19 SergeyBiryukov14 months ago

#23519 was marked as a duplicate.

comment:20 rcain11 months ago

i would like to add my voice to calls for a solution here also please.

i want to be able to inject dynamic data such as bp_loggedin_user_domain() into conventional wp menu url's using conventional wp short-code syntax ie. square brackets.

unfortunately, wp_update_nav_menu_item() calls esc_url_raw($argsmenu-item-url? directly - no filters available and subsequently esc_url_raw() uses hard-coded pregmatch string, also without filter around it. thus, it is impossible to 'filter' this data at all, without hacking core - which i refuse to do.

please can we have some additional filter hooks defined/implemented? either around/inside call to esc_ulr, esc_url_raw or just around the pregmatch string(s) itself/themselves, inside esc_url.

this would help me and many others greatly.

comment:21 SergeyBiryukov10 months ago

[24480] made this more prominent: #24663.

comment:22 ocean9010 months ago

#24663 was marked as a duplicate.

comment:23 nacin10 months ago

  • Milestone changed from Awaiting Review to 3.6
  • Severity changed from minor to major

Marking this as 3.6 and major due to [24480].

The patch seems fine but I think we should see if we can avoid hosts.

comment:24 nacin9 months ago

  • Milestone changed from 3.6 to Future Release

Never mind, punting this. Leaving #24663 open to fix this for wp_http_validate_url() for 3.6.

comment:25 jeremyfelt7 months ago

#25302 was marked as a duplicate.

gcorne2 months ago

gcorne2 months ago

comment:27 gcorne2 months ago

I spent some time looking into this issue as well as #15936. When sanitizing, validating, and escaping URLs, it seems that the most robust solution is to break the url into its components, sanitize, and then rebuild. 16859-03.2.patch does this by leveraging parse_url and then reconstructing the url after sanitizing by following the psuedo code in RFC3986. By breaking the url into its components, we can also easily add other rules. The solution addresses issues with IPv6 literals by allowing [ and ] in the host component and encodes brackets in the path, query, and fragment segments. It feels a little funny doing this encoding here because it seems to me that the url encoding is something that should be happening elsewhere, but since right now the brackets do not function as delimiters outside the host, i think it is okay. All existing tests pass with this solution.

comment:28 nacin2 months ago

This looks like it could be quite a bit slower. Any data on that?

One thing we've thought about is having a "common situation" regex that only allows very basic characters and declares the URL as safe if it matches. Then do the more rigorous, slower checks if it doesn't. It'll slightly slow down the more complex URLs but will speed up the rest of them. The ticket for this is #22951. If the work here significantly slows down this function, it should probably be considered hand-in-hand with #22951.

comment:29 ocean908 weeks ago

#25567 was marked as a duplicate.

Note: See TracTickets for help on using tickets.