Make WordPress Core

Opened 4 weeks ago

Closed 3 weeks ago

Last modified 8 days ago

#63585 closed defect (bug) (duplicate)

Common MacOS unicode characters in filenames break

Reported by: matt's profile matt Owned by:
Milestone: Priority: normal
Severity: normal Version:
Component: Media Keywords: needs-patch
Focuses: Cc:

Description

When uploading "Screenshot 2025-06-17 at 4.43.59 AM.png", the default naming for a screenshot in MacOS, I got this error both times. When I renamed it to "spotlight-with-shortcut.png" it worked fine.

https://cldup.com/KSbUmSJdky-2000x2000.png

The URL in the media library was: https://ma.tt/files/2025/06/Screenshot-2025-06-17-at-5.20.16?PM.png

Which obviously breaks because of the query character.

@zieladam debugged it:

That last space is a Unicode ‘NARROW NO-BREAK SPACE’ (U+202F) – see the 8239 towards the end of the list of codepoints:

> [..."Screenshot 2025-06-17 at 4.43.59 AM.png"].map(s=>s.codePointAt(0))

(39) [83, 99, 114, 101, 101, 110, 115, 104, 111, 116, 32, 50, 48, 50, 53, 45, 48, 54, 45, 49, 55, 32, 97, 116, 32, 52, 46, 52, 51, 46, 53, 57, 8239, 65, 77, 46, 112, 110, 103]

I personally would prefer we returned to something along the lines of:

<?php
remove_accents( $filename );
preg_replace( '/[^a-z0-9 -]/g', '', $filename)
preg_replace( '/\s+/', '-', $filename );

I don't think there is any huge downside to being very conservative with uploaded filenames, and it also probably helps with portability if a site is migrating between different filesystems or OSes. I'm a fan of being ultra-paranoid any time we write user-supplied data to the filesystem.

Change History (8)

#1 @audrasjb
4 weeks ago

Hello and thanks for the report,

It looks like &#8239 (Narrow No-Break Space) is handled in sanitize_title_with_dashes(): https://github.com/WordPress/wordpress-develop/blob/6.8/src/wp-includes/formatting.php#L2360
As soon as this function is used by sanitize_file_name(), I would assume that this character should be handled just fine when uploading a media, but it seems like it's not the case. Maybe it's related to the conditional using seems_utf8() to determine whether sanitize_title_with_dashes() should be applied or not.

#2 @TobiasBg
4 weeks ago

Probably a duplicate of #62995.

#3 @matt
4 weeks ago

This ticket is a duplicate of #62995, and it's probably a good opportunity to garden a bit and close out #39791, #30495, and want to reference #15955 that has a lot of prior art and seems to be one of the places we decided to try and support Unicode in filenames. The simple regex above isn't going to cut it! I think there is some PHP version and library weirdness.

The compatibility issues is well-raised by @compute: "This is really a problem in terms of sharing images on Facebook as they do not accept æøå in their filename."

#4 @matt
3 weeks ago

  • Milestone Awaiting Review deleted
  • Resolution set to duplicate
  • Status changed from new to closed

Duplicate of #62995.

#5 @adamsilverstein
9 days ago

Noting that I was only able to reproduce this with the media experiments plugin or client side media processing enabled in the Gutenberg plugin. Core seems to correctly handle these images now - on a vanilla install of trunk with no plugins, I was able to upload my screenshot to both the editor and the media library without issue.

#7 @TobiasBg
9 days ago

@adamsilverstein: Trunk received a fix shortly ago, in [60399] for #62995.

#8 @adamsilverstein
8 days ago

Ah, great! thanks for pointing that out @TobiasBg! I am still able to reproduce the issue with the media-experiments plugin (running trunk) so that report is still valid.

Note: See TracTickets for help on using tickets.