Make WordPress Core

Opened 4 years ago

Last modified 2 years ago

#50855 accepted defect (bug)

sanitize_file_name function not working as expected if there is '%20' in filename

Reported by: dishitpala's profile dishitpala Owned by: audrasjb's profile audrasjb
Milestone: Future Release Priority: normal
Severity: normal Version: 5.4.2
Component: Formatting Keywords: has-patch has-screenshots has-unit-tests dev-feedback
Focuses: Cc:

Description

We have added '%' as a special character in $special_char variable
We are also replacing '%20' with '-'

Here the sequence of str_replace was not appropriate

We need to replace '%20' with '-' before all the special character are replaced

Current behavior:

  • Filename Before: this%20is%20example.png
  • Filename after sanitization: this20is20example.png

Expected behavior:

  • Filename Before: this%20is%20example.png
  • Filename after sanitization: this-is-example.png

File reference: https://github.com/WordPress/WordPress/blob/master/wp-includes/formatting.php

Function name: sanitize_file_name

Attachments (6)

sanitize-file-name.patch (841 bytes) - added by dishitpala 4 years ago.
Capture d’écran 2021-04-26 à 21.17.56.png (296.2 KB) - added by audrasjb 3 years ago.
Before 50855.diff
Capture d’écran 2021-04-26 à 21.18.08.png (323.0 KB) - added by audrasjb 3 years ago.
After 50855.diff
50855.diff (646 bytes) - added by audrasjb 3 years ago.
Formatting: Reprioritization of the characters to remove from filenames in sanitize_file_name()
50855.2.diff (1.6 KB) - added by boblinthorst 3 years ago.
50855-before-and-after.png (246.1 KB) - added by hellofromTonya 3 years ago.
Test: before and after applying 50855.2.diff patch. Works as expected ✅

Download all attachments as: .zip

Change History (23)

#1 @audrasjb
3 years ago

  • Keywords needs-refresh added

It would be better to keep this line after the sanitize_file_name_chars hook.

@audrasjb
3 years ago

Formatting: Reprioritization of the characters to remove from filenames in sanitize_file_name()

#2 @audrasjb
3 years ago

  • Keywords has-screenshots added; needs-refresh removed
  • Milestone changed from Awaiting Review to 5.8
  • Owner set to audrasjb
  • Status changed from new to accepted

Moving for 5.8 consideration.
@SergeyBiryukov since you committed the last few changes we introduced in WP 5.5, any thoughts on this change?

#3 @audrasjb
3 years ago

Just re-tested my patch and it still applies cleanly against trunk. Marking this for commit.

#4 @audrasjb
3 years ago

  • Keywords commit added

#5 @desrosj
3 years ago

  • Keywords needs-unit-tests added; commit removed

I'd love to see a unit test or two added here to demonstrate the incorrect behavior before, and confirm the correct behavior after, and to prevent this from happening again going forward.

#6 @desrosj
3 years ago

  • Milestone changed from 5.8 to 5.9

I'm going to punt this one as 5.8 beta 1 is being packaged within the hour. I think with proper unit tests, this could potentially be considered for a 5.8.x minor.

@boblinthorst
3 years ago

#7 @boblinthorst
3 years ago

Fixed the existing unit tests. Worked on this with @rolfsiebers and @sresok.

@hellofromTonya
3 years ago

Test: before and after applying 50855.2.diff patch. Works as expected ✅

#8 @hellofromTonya
3 years ago

  • Keywords has-unit-tests added; needs-unit-tests removed

While it worked in testing that specific use case, the tests for wp_unique_filename() failed. Investigating.

#10 @hellofromTonya
3 years ago

  • Keywords dev-feedback added

[29290] #16630 added the logic to search for both %20 and + and then replace them with +.

But then [35122] #16226 added the % and + as special characters which replaces each with an empty string (strips them out).

Notice the timing of these changes: [29290] happened first and then [35122].

sanitize-file-name.patch 50855.diff 50855.2.diff move both the %20 and + to - replacers above the special characters replacers. That could be a breaking change. Why?

  • The + character: since [35122] (in 10/13/2015), the + character has been removed (i.e. replaced with an empty string). Moving it above the special characters means it no longer is stripped out but instead is replaced with a -.
  • The % character: since [29290] (in 07/24/2014), the % character was stripped leaving the 20. Notice Pippin identified that behavior here.

Given the history, I'm seeking more input on:

  • Should the + character be stripped (as special character) or replaced with a -? If special character, then is the + in this code needed?
    $filename = str_replace( array( '%20', '+' ), '-', $filename );
    
  • Is leaving 20 (stripping the %) by design? If no, do you see any issues replacing %20 with -?

#12 @hellofromTonya
3 years ago

  • Milestone changed from 5.9 to 6.0

With 5.9 Beta 1 tomorrow and this ticket needing more discussion and consensus, moving it to 6.0.

This ticket was mentioned in Slack in #core by costdev. View the logs.


2 years ago

#14 @craigfrancis
2 years ago

I'm wondering if 50855.2.diff is the correct approach, by simply moving the current array( '%20', '+' ) up 1 line, so both ways of URL encoding a space are handled before $special_chars.

As to the failing wp_unique_filename() test, that is:

wp_unique_filename( $testdir, '12%af34567890#~!@#$..%^&*()|_+qwerty  fgh`jkl zx<>?:"{}[]="\'/?.png' )

-'12af34567890@..^_qwerty-fghjkl-zx.png'
+'12af34567890@..^_-qwerty-fghjkl-zx.png'

The extra '-' is coming from the '+' before qwerty, so maybe just update the expected test result?

Tonya, I agree this could be a breaking change (e.g. while this usually has other issues as well, a developer could save files using sanitize_file_name() and store the original file names in their database)... but there has kinda been a breaking change already, where the feature that replaces '%20' and '+' so they became '-' was broken (I presume WordPress wanted to visually separate words in a file name, even when the name was URL encoded).

An alternative would be to simply remove that line, and don't handle the URL encoding of spaces.

---

Tangent stuff...

Tonya, thanks for finding Changeset 35122, the addition of '%' and '+' in $special_chars via Changeset 35122 did break this feature (oddly test_replace_spaces() explicitly tests 'multi %20 +space.png' => 'multi-20-space.png').

For the patch from Tonya, the '%20' replacement happens before $special_chars on line 2020; but keeps the '+' replacement on line 2022, I where don't think that will do anything (Peter has come to the same conclusion).

I'm going to trust the $special_chars is complete (I'm not sure if there are other problematic characters)... but either way, they can be replaced by a hyphen or be removed. Both replacement methods are safe, and I'm pretty sure it's just personal taste on which one to use (e.g. "MyFile(1).jpg", should that be "MyFile1.jpg" or "MyFile-1.jpg"?). For now I'd prefer to stay with '%20' and '+', with a separate discussion on what other characters could be replaced by a hyphen.

To answer your questions, I think the '+' should be replaced with a '-', because that seems to be the intended and original behaviour (why the array( '%20', '+' ) was added in the first place). Secondly, I don't think leaving '20' was by design (a half encoded value), and I cannot think of any issue with replacing '%20' with a '-' in a function that intentionally removes bad characters from a file name.

With PHP, the urlencode() function follows RFC 1738 and its use of URI defined in RFC 1630 (ref page 7, which says "Within the query string, the plus sign is reserved as shorthand notation for a space."), this is why a space is often encoded a space to '+' (source), whereas rawurlencode() follows RFC 3986 and encodes a space to '%20' (i.e. it drops the special case). In both cases we are assuming the file name has been URL encoded.

#15 @costdev
2 years ago

  • Milestone changed from 6.0 to 6.1

With 6.0 RC1 tomorrow, I'm moving this ticket to the 6.1 milestone as discussion is still ongoing.

This ticket was mentioned in Slack in #core by audrasjb. View the logs.


2 years ago

#17 @audrasjb
2 years ago

  • Milestone changed from 6.1 to Future Release

As per today's bug scrub:

Unfortunately, this ticket wasn’t updated since it was moved from milestone 6.0 to 6.1. Since WP 6.1 RC1 is coming next Tuesday, let’s move it to Future Release.

Note: See TracTickets for help on using tickets.