WordPress.org

Make WordPress Core

Opened 14 months ago

Last modified 3 months ago

#23605 new defect (bug)

esc_url() strips spaces instead of encoding them

Reported by: johnbillion Owned by:
Milestone: Future Release Priority: normal
Severity: normal Version:
Component: Formatting Keywords:
Focuses: Cc:

Description

If I pass a URL into esc_url() that contains a space, the space is stripped instead of encoded.

To reproduce:

$url = 'http://example.com/foo bar/';

echo '<pre>';
var_dump( $url );
var_dump( esc_url( $url ) );
echo '</pre>';

The resulting URL ends up as http://example.com/foobar/ instead of the expected http://example.com/foo%20bar/

Change History (12)

comment:2 bananastalktome14 months ago

  • Cc bananastalktome@… added

The stripping spaces behavior is actually reflected in the unit tests, according to the test at source:trunk/tests/formatting/EscUrl.php@1219#L8, it seems originally added in [UT331]. Seems unusual, and I wonder if the test should be changed to reflect the desired behavior instead (encoding spaces)?

comment:3 SergeyBiryukov14 months ago

test_spaces() was originally added in [226/tests], modified in [229/tests] and [273/tests].

comment:4 jscampbell.0514 months ago

I would quite like this fixed as it is driving me mad, I have resorted for the moment using a str_replace on the string in a custom function I wrote.

My function goes like this:

function jc_encode_spaces($string){
    return str_replace(' ', '%20', $string);
}

Not Idea but it does the job

comment:5 rmccue14 months ago

Space is an invalid character in URLs, so it should be escaped just like any other invalid character. Stripping them is absolutely the wrong thing to do.

comment:6 jscampbell.0514 months ago

Agreed, perhaps it should be a parmeter ? You can set how it encodes the URL i.e Strip Space / Encode Spaces.

PHP built in functions use a bit mask to do this.

Last edited 14 months ago by jscampbell.05 (previous) (diff)

comment:7 helen14 months ago

Aren't we mixing up escaping for display and actual encoding here? PHPDoc for esc_url() does indicate that it removes characters, not encodes them. Seems like there are any number of characters that are stripped rather than encoded, not just spaces, for what it's worth.

comment:8 jscampbell.0514 months ago

Is there actually a away to encode spaces ? i.e " " becomes %20. I must say I expect most of the URI based functions to do this and not to simply strip away the spaces so they don't point to the correct resource.

comment:9 DrandLomB8 months ago

I'm admittedly not a seasoned developer, but I too want to register my dismay with the esc_url() function. If WordPress wants us to use this type of function on a religious basis as suggested by the documentation, then it should help, not hinder.

Regardless how accurately the Codex may document that it "escapes" versus "encodes" or whatever other technical nuances one may wish to apply, the fact is that what we want is a function that reliably returns what we give it ready to output safely. Discarding perfectly valid spaces is hardly accomplishing that!

As is, I'll have to wrap this in my own function to first encode characters before I can give it to esc_url(). Unless I'm really missing the point, it seems very unhelpful as is.

comment:10 ScottSmith4 months ago

This has been driving me mad as well. I'd love to see a fix.

comment:11 follow-up: nacin3 months ago

  • Milestone changed from Awaiting Review to Future Release

If we're going to encode spaces, should we be encoding other characters too? I'm not necessarily against this fix — if a space is passed to esc_url(), we know exactly what it should be and it shouldn't be a big deal to encode it — but I'm wondering where that stops.

comment:12 in reply to: ↑ 11 betzster3 months ago

Replying to nacin:

If we're going to encode spaces, should we be encoding other characters too? I'm not necessarily against this fix — if a space is passed to esc_url(), we know exactly what it should be and it shouldn't be a big deal to encode it — but I'm wondering where that stops.

Can't we just lean on rawurlencode() to encode everything properly?

Note: See TracTickets for help on using tickets.