#55807 closed defect (bug) (duplicate)
sanitize_file_name not sanitizing decomposed unicode when file is uploaded using Chrome and Firefox
Reported by: | christerolsson | Owned by: | |
---|---|---|---|
Milestone: | Priority: | normal | |
Severity: | normal | Version: | 5.9.3 |
Component: | Charset | Keywords: | |
Focuses: | Cc: |
Description
sanitize_file_name does not sanitize decomposed unicode characters in a file name when file is uploaded using Chrome and Firefox, it does work just fine if the file is uploaded using Safari.
The problem probably only appears for macOS users since macOS seems to be the only OS storing filenames in decomposed unicode.
Uploading räksmörgås.jpg with Safari correctly creates the file raksmorgas.jpg while uploading with Chrome or Firefox creates the file räksmörgås.jpg.
Change History (5)
This ticket was mentioned in Slack in #core-test by ironprogrammer. View the logs.
2 years ago
#2
@
2 years ago
#3
@
2 years ago
I could replicate the same result as mentioned by @poena
Environment:
OS: macOS 12.4
Web Server: Nginx
PHP: 8
WordPress: 6.0 and 5.9.3
Safari Browser Version 15.5 (17613.2.7.1.8) -åäö are removed: https://ibb.co/hdZFGjk
Other Browsers -åäö are not removed - https://ibb.co/QkGy4FG
#4
@
2 years ago
- Component changed from Upload to Charset
- Resolution set to duplicate
- Status changed from new to closed
Thanks for your report, @christerolsson!
After some digging into the underlying remove_accents()
function, this appears to be a duplicate of #24661.
Rationale
In the case of this report, the filename provided included the following 3-byte sequence characters ("combined character" sequences):
äöå # or (hex) \x61\xcc\x88\x6f\xcc\x88\x61\xcc\x8a
In the case of Safari (v15.5), it normalizes the uploaded filename to 2-byte sequences. Then in remove_accents()
(which contains an array of 2-byte characters for translation), the substitutions work as intended.
äöå # or \xc3\xa4\xc3\xb6\xc3\xa5 => aoa
However, Chrome and Firefox do not normalize the filename, so the string passed to the function retains the original 3-byte characters, which won't get matched per the function at this time.
Please follow #24661 for continuation of this effort.
Reproduction Report
This report validates that the issue can be reproduced.
Environment
OS: MacOS 12.4
Web Server: Nginx
PHP: 7.4.1
WordPress: 6.0, 5.9.3
Browsers:
Safar: Version 15.5 (17613.2.7.1.8) -åäö are removed
Chrome: Version 102.0.5005.61 (Official Build) (arm64) -åäö are not removed
Firefox: Version 101. -åäö are not removed