WordPress.org

Make WordPress Core

Opened 7 years ago

Closed 4 years ago

Last modified 4 years ago

#23605 closed defect (bug) (fixed)

esc_url() strips spaces instead of encoding them

Reported by: johnbillion Owned by: johnbillion
Milestone: 4.4 Priority: normal
Severity: normal Version: 2.8
Component: Formatting Keywords: has-patch
Focuses: Cc:
PR Number:

Description

If I pass a URL into esc_url() that contains a space, the space is stripped instead of encoded.

To reproduce:

$url = 'http://example.com/foo bar/';

echo '<pre>';
var_dump( $url );
var_dump( esc_url( $url ) );
echo '</pre>';

The resulting URL ends up as http://example.com/foobar/ instead of the expected http://example.com/foo%20bar/

Attachments (2)

23605.diff (1.8 KB) - added by enshrined 4 years ago.
Escape spaces in esc_url rather than remove them
23605_geturlincontent_tests.diff (536 bytes) - added by gitlost 4 years ago.
Unit test needs updating to reflect kept space.

Download all attachments as: .zip

Change History (30)

#2 @bananastalktome
7 years ago

  • Cc bananastalktome@… added

The stripping spaces behavior is actually reflected in the unit tests, according to the test at source:trunk/tests/formatting/EscUrl.php@1219#L8, it seems originally added in [UT331]. Seems unusual, and I wonder if the test should be changed to reflect the desired behavior instead (encoding spaces)?

#3 @SergeyBiryukov
7 years ago

test_spaces() was originally added in [226/tests], modified in [229/tests] and [273/tests].

#4 @jscampbell.05
7 years ago

I would quite like this fixed as it is driving me mad, I have resorted for the moment using a str_replace on the string in a custom function I wrote.

My function goes like this:

function jc_encode_spaces($string){
    return str_replace(' ', '%20', $string);
}

Not Idea but it does the job

#5 @rmccue
7 years ago

Space is an invalid character in URLs, so it should be escaped just like any other invalid character. Stripping them is absolutely the wrong thing to do.

#6 @jscampbell.05
7 years ago

Agreed, perhaps it should be a parmeter ? You can set how it encodes the URL i.e Strip Space / Encode Spaces.

PHP built in functions use a bit mask to do this.

Last edited 7 years ago by jscampbell.05 (previous) (diff)

#7 @helen
7 years ago

Aren't we mixing up escaping for display and actual encoding here? PHPDoc for esc_url() does indicate that it removes characters, not encodes them. Seems like there are any number of characters that are stripped rather than encoded, not just spaces, for what it's worth.

#8 @jscampbell.05
7 years ago

Is there actually a away to encode spaces ? i.e " " becomes %20. I must say I expect most of the URI based functions to do this and not to simply strip away the spaces so they don't point to the correct resource.

#9 @DrandLomB
6 years ago

I'm admittedly not a seasoned developer, but I too want to register my dismay with the esc_url() function. If WordPress wants us to use this type of function on a religious basis as suggested by the documentation, then it should help, not hinder.

Regardless how accurately the Codex may document that it "escapes" versus "encodes" or whatever other technical nuances one may wish to apply, the fact is that what we want is a function that reliably returns what we give it ready to output safely. Discarding perfectly valid spaces is hardly accomplishing that!

As is, I'll have to wrap this in my own function to first encode characters before I can give it to esc_url(). Unless I'm really missing the point, it seems very unhelpful as is.

#10 @ScottSmith
6 years ago

This has been driving me mad as well. I'd love to see a fix.

#11 follow-up: @nacin
6 years ago

  • Milestone changed from Awaiting Review to Future Release

If we're going to encode spaces, should we be encoding other characters too? I'm not necessarily against this fix — if a space is passed to esc_url(), we know exactly what it should be and it shouldn't be a big deal to encode it — but I'm wondering where that stops.

#12 in reply to: ↑ 11 ; follow-up: @betzster
6 years ago

Replying to nacin:

If we're going to encode spaces, should we be encoding other characters too? I'm not necessarily against this fix — if a space is passed to esc_url(), we know exactly what it should be and it shouldn't be a big deal to encode it — but I'm wondering where that stops.

Can't we just lean on rawurlencode() to encode everything properly?

#13 @bfintal
4 years ago

Since esc_url is also used when printing out links & href attributes, My opinion is that it should not make the inputted url wrong, so I would prefer that spaces were encoded to perhaps %20 or +

#14 @johnbillion
4 years ago

  • Keywords needs-patch needs-unit-tests added
  • Milestone changed from Future Release to 4.4
  • Version set to 2.8

#15 in reply to: ↑ 12 @miqrogroove
4 years ago

Replying to betzster:

Can't we just lean on rawurlencode() to encode everything properly?

Nope. The purpose of esc_url appears to be removal of non-printing special characters. If we make it an alias of rawurlencode, it will also encode colons and slashes, making most URLs invalid.

@enshrined
4 years ago

Escape spaces in esc_url rather than remove them

#16 follow-up: @enshrined
4 years ago

The above patch adds a str_replace call as the first thing in esc_url. This makes sure that spaces are not stripped but rather replaced by %20. It also updates the tests to reflect these changes.

It may be worth deciding what happens for multiple spaces, should these stay in the form of %20%20 or be stripped down to a single %20?

#17 @johnbillion
4 years ago

In 33851:

Add some more passing unit tests for esc_url() in preparation for upcoming changes.

See #23605, #20771, #16859

#18 @johnbillion
4 years ago

In 33855:

More unit tests for esc_url() and esc_url_raw().

See #23605, #20771, #16859

#19 @johnbillion
4 years ago

  • Owner set to johnbillion
  • Resolution set to fixed
  • Status changed from new to closed

In 33858:

Correctly encode spaces in URLs passed to esc_url() instead of removing them.

Fixes #23605
Props enshrined, johnbillion

#20 in reply to: ↑ 16 @johnbillion
4 years ago

Replying to enshrined:

It may be worth deciding what happens for multiple spaces, should these stay in the form of %20%20 or be stripped down to a single %20?

Thanks for the patch! Multiple consecutive spaces are fine, it's just the encoding of them (or rather, non-removal of them) that esc_url() is concerned with.

#21 @johnbillion
4 years ago

  • Keywords needs-patch needs-unit-tests removed

@gitlost
4 years ago

Unit test needs updating to reflect kept space.

#22 @gitlost
4 years ago

Noticed this when testing something else.

#23 @enshrined
4 years ago

Ahh yes, sorry. Completely missed this one. Don't know why it didn't fail on me for some reason

#24 @miqrogroove
4 years ago

  • Keywords has-patch added
  • Resolution fixed deleted
  • Status changed from closed to reopened

This ticket was mentioned in Slack in #core by miqrogroove. View the logs.


4 years ago

#26 @SergeyBiryukov
4 years ago

  • Resolution set to fixed
  • Status changed from reopened to closed

In 33905:

Update Tests_Formatting_GetUrlInContent::get_input_output() after [33858].

Props gitlost.
Fixes #23605.

#27 @SergeyBiryukov
4 years ago

In 33906:

Update Tests_Admin_includesPlugin::test_menu_page_url() after [33858].

See #23605.

#28 @SergeyBiryukov
4 years ago

In 33907:

Update Tests_Sanitize_Option::test_sanitize_option() after [33858].

See #23605.

Note: See TracTickets for help on using tickets.