Make WordPress Core

Opened 4 years ago

Last modified 2 years ago

#50855 accepted defect (bug)

sanitize_file_name function not working as expected if there is '%20' in filename

Reported by: dishitpala's profile dishitpala Owned by: audrasjb's profile audrasjb
Milestone: Future Release Priority: normal
Severity: normal Version: 5.4.2
Component: Formatting Keywords: has-patch has-screenshots has-unit-tests dev-feedback
Focuses: Cc:

Description

We have added '%' as a special character in $special_char variable
We are also replacing '%20' with '-'

Here the sequence of str_replace was not appropriate

We need to replace '%20' with '-' before all the special character are replaced

Current behavior:

  • Filename Before: this%20is%20example.png
  • Filename after sanitization: this20is20example.png

Expected behavior:

  • Filename Before: this%20is%20example.png
  • Filename after sanitization: this-is-example.png

File reference: ​https://github.com/WordPress/WordPress/blob/master/wp-includes/formatting.php

Function name: sanitize_file_name

Attachments (6)

sanitize-file-name.patch​ (841 bytes) - added by dishitpala 4 years ago.
Capture d’écran 2021-04-26 Γ  21.17.56.png​ (296.2 KB) - added by audrasjb 4 years ago.
Before 50855.diff
Capture d’écran 2021-04-26 Γ  21.18.08.png​ (323.0 KB) - added by audrasjb 4 years ago.
After 50855.diff
50855.diff​ (646 bytes) - added by audrasjb 4 years ago.
Formatting: Reprioritization of the characters to remove from filenames in sanitize_file_name()
50855.2.diff​ (1.6 KB) - added by boblinthorst 3 years ago.
50855-before-and-after.png​ (246.1 KB) - added by hellofromTonya 3 years ago.
Test: before and after applying 50855.2.diff patch. Works as expected βœ…

Download all attachments as: .zip

Change History (23)

#1 @audrasjb
4 years ago

  • Keywords needs-refresh added

It would be better to keep this line after the sanitize_file_name_chars hook.

@audrasjb
4 years ago

Formatting: Reprioritization of the characters to remove from filenames in sanitize_file_name()

#2 @audrasjb
4 years ago

  • Keywords has-screenshots added; needs-refresh removed
  • Milestone changed from Awaiting Review to 5.8
  • Owner set to audrasjb
  • Status changed from new to accepted

Moving for 5.8 consideration.
@SergeyBiryukov since you committed the last few changes we introduced in WP 5.5, any thoughts on this change?

#3 @audrasjb
4 years ago

Just re-tested my patch and it still applies cleanly against trunk. Marking this for commit.

#4 @audrasjb
4 years ago

  • Keywords commit added

#5 @desrosj
4 years ago

  • Keywords needs-unit-tests added; commit removed

I'd love to see a unit test or two added here to demonstrate the incorrect behavior before, and confirm the correct behavior after, and to prevent this from happening again going forward.

#6 @desrosj
4 years ago

  • Milestone changed from 5.8 to 5.9

I'm going to punt this one as 5.8 beta 1 is being packaged within the hour. I think with proper unit tests, this could potentially be considered for a 5.8.x minor.

#7 @boblinthorst
3 years ago

Fixed the existing unit tests. Worked on this with @rolfsiebers and @sresok.

@hellofromTonya
3 years ago

Test: before and after applying 50855.2.diff patch. Works as expected βœ…

#8 @hellofromTonya
3 years ago

  • Keywords has-unit-tests added; needs-unit-tests removed

While it worked in testing that specific use case, the tests for wp_unique_filename() failed. Investigating.

#10 @hellofromTonya
3 years ago

  • Keywords dev-feedback added

[29290] #16630 added the logic to search for both %20 and + and then replace them with +.

But then [35122] #16226 added the % and + as special characters which replaces each with an empty string (strips them out).

Notice the timing of these changes: [29290] happened first and then [35122].

sanitize-file-name.patch​ 50855.diff​ 50855.2.diff​ move both the %20 and + to - replacers above the special characters replacers. That could be a breaking change. Why?

  • The + character: since [35122] (in 10/13/2015), the + character has been removed (i.e. replaced with an empty string). Moving it above the special characters means it no longer is stripped out but instead is replaced with a -.
  • The % character: since [29290] (in 07/24/2014), the % character was stripped leaving the 20. Notice Pippin identified that behavior here.

Given the history, I'm seeking more input on:

  • Should the + character be stripped (as special character) or replaced with a -? If special character, then is the + in this code needed?
    $filename = str_replace( array( '%20', '+' ), '-', $filename );
    
  • Is leaving 20 (stripping the %) by design? If no, do you see any issues replacing %20 with -?

#12 @hellofromTonya
3 years ago

  • Milestone changed from 5.9 to 6.0

With 5.9 Beta 1 tomorrow and this ticket needing more discussion and consensus, moving it to 6.0.

This ticket was mentioned in ​Slack in #core by costdev. ​View the logs.


3 years ago

#14 @craigfrancis
3 years ago

I'm wondering if 50855.2.diff is the correct approach, by simply moving the current array( '%20', '+' ) up 1 line, so both ways of URL encoding a space are handled before $special_chars.

As to the failing wp_unique_filename() test, that is:

wp_unique_filename( $testdir, '12%af34567890#~!@#$..%^&*()|_+qwerty  fgh`jkl zx<>?:"{}[]="\'/?.png' )

-'12af34567890@..^_qwerty-fghjkl-zx.png'
+'12af34567890@..^_-qwerty-fghjkl-zx.png'

The extra '-' is coming from the '+' before qwerty, so maybe just update the expected test result?

Tonya, I agree this could be a breaking change (e.g. while this usually has other issues as well, a developer could save files using sanitize_file_name() and store the original file names in their database)... but there has kinda been a breaking change already, where the feature that replaces '%20' and '+' so they became '-' was broken (I presume WordPress wanted to visually separate words in a file name, even when the name was URL encoded).

An alternative would be to simply remove that line, and don't handle the URL encoding of spaces.

---

Tangent stuff...

Tonya, thanks for finding Changeset 35122, the addition of '%' and '+' in $special_chars via Changeset 35122 did break this feature (oddly test_replace_spaces() explicitly tests 'multi %20 +space.png' => 'multi-20-space.png').

For the ​patch from Tonya, the '%20' replacement happens before $special_chars on ​line 2020; but keeps the '+' replacement on ​line 2022, I where don't think that will do anything (Peter has come to the same conclusion).

I'm going to trust the $special_chars is complete (I'm not sure if there are other problematic characters)... but either way, they can be replaced by a hyphen or be removed. Both replacement methods are safe, and I'm pretty sure it's just personal taste on which one to use (e.g. "MyFile(1).jpg", should that be "MyFile1.jpg" or "MyFile-1.jpg"?). For now I'd prefer to stay with '%20' and '+', with a separate discussion on what other characters could be replaced by a hyphen.

To answer your questions, I think the '+' should be replaced with a '-', because that seems to be the intended and original behaviour (why the array( '%20', '+' ) was added in the first place). Secondly, I don't think leaving '20' was by design (a half encoded value), and I cannot think of any issue with replacing '%20' with a '-' in a function that intentionally removes bad characters from a file name.

With PHP, the urlencode() function follows RFC 1738 and its use of URI defined in RFC 1630 (ref ​page 7, which says "Within the query string, the plus sign is reserved as shorthand notation for a space."), this is why a space is often encoded a space to '+' (​source), whereas rawurlencode() follows RFC 3986 and encodes a space to '%20' (i.e. it drops the special case). In both cases we are assuming the file name has been URL encoded.

#15 @costdev
3 years ago

  • Milestone changed from 6.0 to 6.1

With 6.0 RC1 tomorrow, I'm moving this ticket to the 6.1 milestone as discussion is still ongoing.

This ticket was mentioned in ​Slack in #core by audrasjb. ​View the logs.


2 years ago

#17 @audrasjb
2 years ago

  • Milestone changed from 6.1 to Future Release

As per today's bug scrub:

Unfortunately, this ticket wasn’t updated since it was moved from milestone 6.0 to 6.1. Since WP 6.1 RC1 is coming next Tuesday, let’s move it to Future Release.

Note: See TracTickets for help on using tickets.