Make WordPress Core

Opened 3 years ago

Closed 2 years ago

Last modified 2 years ago

#55807 closed defect (bug) (duplicate)

sanitize_file_name not sanitizing decomposed unicode when file is uploaded using Chrome and Firefox

Reported by: christerolsson's profile christerolsson Owned by:
Milestone: Priority: normal
Severity: normal Version: 5.9.3
Component: Charset Keywords:
Focuses: Cc:

Description

sanitize_file_name does not sanitize decomposed unicode characters in a file name when file is uploaded using Chrome and Firefox, it does work just fine if the file is uploaded using Safari.

The problem probably only appears for macOS users since macOS seems to be the only OS storing filenames in decomposed unicode.

Uploading räksmörgås.jpg with Safari correctly creates the file raksmorgas.jpg while uploading with Chrome or Firefox creates the file räksmörgås.jpg.

Change History (5)

This ticket was mentioned in Slack in #core-test by ironprogrammer. View the logs.


2 years ago

#2 @poena
2 years ago

Reproduction Report

This report validates that the issue can be reproduced.

Environment

OS: MacOS 12.4
Web Server: Nginx
PHP: 7.4.1
WordPress: 6.0, 5.9.3

Browsers:
Safar: Version 15.5 (17613.2.7.1.8) -åäö are removed
Chrome: Version 102.0.5005.61 (Official Build) (arm64) -åäö are not removed
Firefox: Version 101. -åäö are not removed

Version 0, edited 2 years ago by poena (next)

#3 @ugyensupport
2 years ago

I could replicate the same result as mentioned by @poena
Environment:
OS: macOS 12.4
Web Server: Nginx
PHP: 8
WordPress: 6.0 and 5.9.3
Safari Browser Version 15.5 (17613.2.7.1.8) -åäö are removed: https://ibb.co/hdZFGjk
Other Browsers -åäö are not removed - https://ibb.co/QkGy4FG

Last edited 2 years ago by ugyensupport (previous) (diff)

#4 @ironprogrammer
2 years ago

  • Component changed from Upload to Charset
  • Resolution set to duplicate
  • Status changed from new to closed

Thanks for your report, @christerolsson!

After some digging into the underlying remove_accents() function, this appears to be a duplicate of #24661.

Rationale

In the case of this report, the filename provided included the following 3-byte sequence characters ("combined character" sequences):

äöå
# or (hex)
\x61\xcc\x88\x6f\xcc\x88\x61\xcc\x8a

In the case of Safari (v15.5), it normalizes the uploaded filename to 2-byte sequences. Then in remove_accents() (which contains an array of 2-byte characters for translation), the substitutions work as intended.

äöå
# or
\xc3\xa4\xc3\xb6\xc3\xa5

=> aoa

However, Chrome and Firefox do not normalize the filename, so the string passed to the function retains the original 3-byte characters, which won't get matched per the function at this time.

Please follow #24661 for continuation of this effort.

#5 @desrosj
2 years ago

  • Milestone Awaiting Review deleted
Note: See TracTickets for help on using tickets.