Opened 4 years ago
Last modified 2 years ago
#50855 accepted defect (bug)
sanitize_file_name function not working as expected if there is '%20' in filename
Reported by: | dishitpala | Owned by: | audrasjb |
---|---|---|---|
Milestone: | Future Release | Priority: | normal |
Severity: | normal | Version: | 5.4.2 |
Component: | Formatting | Keywords: | has-patch has-screenshots has-unit-tests dev-feedback |
Focuses: | Cc: |
Description
We have added '%' as a special character in $special_char
variable
We are also replacing '%20' with '-'
Here the sequence of str_replace was not appropriate
We need to replace '%20' with '-' before all the special character are replaced
Current behavior:
- Filename Before:
this%20is%20example.png
- Filename after sanitization:
this20is20example.png
Expected behavior:
- Filename Before:
this%20is%20example.png
- Filename after sanitization:
this-is-example.png
File reference: https://github.com/WordPress/WordPress/blob/master/wp-includes/formatting.php
Function name: sanitize_file_name
Attachments (6)
Change History (23)
#1
@
3 years ago
- Keywords needs-refresh added
It would be better to keep this line after the sanitize_file_name_chars
hook.
@
3 years ago
Formatting: Reprioritization of the characters to remove from filenames in sanitize_file_name()
#2
@
3 years ago
- Keywords has-screenshots added; needs-refresh removed
- Milestone changed from Awaiting Review to 5.8
- Owner set to audrasjb
- Status changed from new to accepted
Moving for 5.8 consideration.
@SergeyBiryukov since you committed the last few changes we introduced in WP 5.5, any thoughts on this change?
#3
@
3 years ago
Just re-tested my patch and it still applies cleanly against trunk. Marking this for commit
.
#5
@
3 years ago
- Keywords needs-unit-tests added; commit removed
I'd love to see a unit test or two added here to demonstrate the incorrect behavior before, and confirm the correct behavior after, and to prevent this from happening again going forward.
#6
@
3 years ago
- Milestone changed from 5.8 to 5.9
I'm going to punt this one as 5.8 beta 1 is being packaged within the hour. I think with proper unit tests, this could potentially be considered for a 5.8.x minor.
#8
@
3 years ago
- Keywords has-unit-tests added; needs-unit-tests removed
While it worked in testing that specific use case, the tests for wp_unique_filename()
failed. Investigating.
This ticket was mentioned in PR #1873 on WordPress/wordpress-develop by hellofromtonya.
3 years ago
#9
Trac ticket: https://core.trac.wordpress.org/ticket/50855
#10
@
3 years ago
- Keywords dev-feedback added
[29290] #16630 added the logic to search for both %20
and +
and then replace them with +
.
But then [35122] #16226 added the %
and +
as special characters which replaces each with an empty string (strips them out).
Notice the timing of these changes: [29290] happened first and then [35122].
sanitize-file-name.patch 50855.diff 50855.2.diff move both the %20
and +
to -
replacers above the special characters replacers. That could be a breaking change. Why?
- The
+
character: since [35122] (in 10/13/2015), the+
character has been removed (i.e. replaced with an empty string). Moving it above the special characters means it no longer is stripped out but instead is replaced with a-
. - The
%
character: since [29290] (in 07/24/2014), the%
character was stripped leaving the20
. Notice Pippin identified that behavior here.
Given the history, I'm seeking more input on:
- Should the
+
character be stripped (as special character) or replaced with a-
? If special character, then is the+
in this code needed?$filename = str_replace( array( '%20', '+' ), '-', $filename );
- Is leaving
20
(stripping the%
) by design? If no, do you see any issues replacing%20
with-
?
#12
@
3 years ago
- Milestone changed from 5.9 to 6.0
With 5.9 Beta 1 tomorrow and this ticket needing more discussion and consensus, moving it to 6.0.
This ticket was mentioned in Slack in #core by costdev. View the logs.
2 years ago
#14
@
2 years ago
I'm wondering if 50855.2.diff is the correct approach, by simply moving the current array( '%20', '+' )
up 1 line, so both ways of URL encoding a space are handled before $special_chars
.
As to the failing wp_unique_filename()
test, that is:
wp_unique_filename( $testdir, '12%af34567890#~!@#$..%^&*()|_+qwerty fgh`jkl zx<>?:"{}[]="\'/?.png' ) -'12af34567890@..^_qwerty-fghjkl-zx.png' +'12af34567890@..^_-qwerty-fghjkl-zx.png'
The extra '-'
is coming from the '+'
before qwerty
, so maybe just update the expected test result?
Tonya, I agree this could be a breaking change (e.g. while this usually has other issues as well, a developer could save files using sanitize_file_name()
and store the original file names in their database)... but there has kinda been a breaking change already, where the feature that replaces '%20'
and '+'
so they became '-'
was broken (I presume WordPress wanted to visually separate words in a file name, even when the name was URL encoded).
An alternative would be to simply remove that line, and don't handle the URL encoding of spaces.
---
Tangent stuff...
Tonya, thanks for finding Changeset 35122, the addition of '%'
and '+'
in $special_chars
via Changeset 35122 did break this feature (oddly test_replace_spaces()
explicitly tests 'multi %20 +space.png' => 'multi-20-space.png'
).
For the patch from Tonya, the '%20'
replacement happens before $special_chars
on line 2020; but keeps the '+'
replacement on line 2022, I where don't think that will do anything (Peter has come to the same conclusion).
I'm going to trust the $special_chars
is complete (I'm not sure if there are other problematic characters)... but either way, they can be replaced by a hyphen or be removed. Both replacement methods are safe, and I'm pretty sure it's just personal taste on which one to use (e.g. "MyFile(1).jpg", should that be "MyFile1.jpg" or "MyFile-1.jpg"?). For now I'd prefer to stay with '%20'
and '+'
, with a separate discussion on what other characters could be replaced by a hyphen.
To answer your questions, I think the '+'
should be replaced with a '-'
, because that seems to be the intended and original behaviour (why the array( '%20', '+' )
was added in the first place). Secondly, I don't think leaving '20'
was by design (a half encoded value), and I cannot think of any issue with replacing '%20'
with a '-'
in a function that intentionally removes bad characters from a file name.
With PHP, the urlencode()
function follows RFC 1738 and its use of URI defined in RFC 1630 (ref page 7, which says "Within the query string, the plus sign is reserved as shorthand notation for a space."), this is why a space is often encoded a space to '+'
(source), whereas rawurlencode()
follows RFC 3986 and encodes a space to '%20'
(i.e. it drops the special case). In both cases we are assuming the file name has been URL encoded.
#15
@
2 years ago
- Milestone changed from 6.0 to 6.1
With 6.0 RC1 tomorrow, I'm moving this ticket to the 6.1 milestone as discussion is still ongoing.
Before 50855.diff