Opened 11 months ago
Closed 7 months ago
#62995 closed defect (bug) (fixed)
Uploading Mac screenshots results in broken images, due to question marks inserted in filenames
| Reported by: |
|
Owned by: |
|
|---|---|---|---|
| Milestone: | 6.8.2 | Priority: | normal |
| Severity: | normal | Version: | |
| Component: | Media | Keywords: | reporter-feedback has-patch has-unit-tests dev-reviewed fixed-major |
| Focuses: | administration | Cc: |
Description
macOS 15 Sequoia (and possibly earlier versions) uses a special character, rather than a regular ASCII space, in screenshot filenames, between the time and the AM/PM. For instance:
Screenshot 2025-02-19 at 2.17.33 PM.png
It may get lost here, but the character before PM in that filename is not a regular space. I ran the filename text string through a hex converter, and it identified it as the Unicode [the narrow no-break space](https://www.fileformat.info/info/unicode/char/202f/index.htm) (U+202F).
The problem is, when uploading a file with that type of filename (tested in both Safari and Firefox), the WordPress function that standardizes Media Library filenames converts that character into a question mark, rather than a hyphen.
For obvious reasons, there should never be a question mark in filenames in the Media Library. The result is of course that the browser interprets the part after the question mark as a query string, and now the URL is requesting a non-existent file, causing a broken image on the page.
The attached screenshot shows the filename with a question mark after uploading the image to the Media Library.
Attachments (2)
Change History (31)
#1
follow-up:
↓ 2
@
11 months ago
- Focuses ui removed
- Keywords reporter-feedback added
- Version 6.7.2 deleted
Hello and thanks for the ticket,
Could you please attach the exact file you can't upload so we can try to reproduce the issue?
#2
in reply to:
↑ 1
@
11 months ago
Replying to audrasjb:
Hello and thanks for the ticket,
Could you please attach the exact file you can't upload so we can try to reproduce the issue?
The file I uploaded is a screenshot itself with the same type of filename, so it could be used for testing. However I suspect its filename has been converted by Trac in a different way and the issue would not happen. Testing can be done easily by anyone on a Mac by just creating a screenshot and trying to upload it.
#3
@
11 months ago
@room34 I can confirm this file can't be used to reproduce the issue. Maybe compress your initial file to a zip file and try to attach it?
#4
follow-up:
↓ 5
@
11 months ago
@audrasjb I think it’s possible/likely that you need to be on a Mac file system to properly reproduce the issue, so you or whoever is trying to reproduce could just take a screenshot on that system and use it for testing.
#5
in reply to:
↑ 4
@
11 months ago
Replying to room34:
@audrasjb I think it’s possible/likely that you need to be on a Mac file system to properly reproduce the issue, so you or whoever is trying to reproduce could just take a screenshot on that system and use it for testing.
I am on a Mac system. But as my computer's locale is French, I don't have the same default file names than yours.
#6
@
11 months ago
@audrasjb Of course… I apologize for my US-centric mindset. I've zipped the file and am adding it as another attachment.
#7
@
11 months ago
Thanks! By the way I can't reproduce the issue on my side: the media is successfully uploaded. Maybe it's an environment issue on your side?
Could you please try on a fresh instance of WP? Or on this Playground instance?
#8
@
11 months ago
Interesting...
Tested in Playground: Uploads successfully.
Tested with clean WordPress install of a new site on the same server: Uploads successfully.
Tested on original site with all plugins deactivated, Twenty Twenty-Five theme, and WP core reinstalled: Error still occurs.
It looks like it has to do with the data tables. The original site where the error is happening is a very old installation. It's running 6.7.2, but it was originally created with WordPress 2.1 in 2007.
I noticed the installations where this was successful did not replace the narrow no-break space character in the filename. I checked the database, and the core WP tables have "latin1_swedish_ci" collation, so they're not UTF-8. That would explain the question marks.
Anyway… I do think this still points to an issue that should be addressed. Since WordPress is already modifying Media Library filenames to be maximally compatible for URLs, I would think ideally it would strip out most non-ASCII characters, at least non-essential ones, like the narrow no-break space.
#10
follow-up:
↓ 11
@
7 months ago
For obvious reasons, there should never be a question mark in filenames in the Media Library.
As a side mention, I would pose that it’s not obvious that questions marks should be disallowed, or any character other than the ASCII “/”, since that is used across various filesystems as the path separator.
HTML has always supported URLs with arbitrary characters in them, so I think there are two issues at play.
The media library is not properly forming encoded URLs.
Even if we resolve the name transformation in sanitize_file_name() we would want to ensure that we build the proper URLs when rendering HTML to the browser. The proper way to create a URL for a filename with a question mark in it is to percent-escape it.
u = new URL( 'https://wordpress.org' ); u.pathname = '/wp-content/uploads/Screenshot 2025-02-19 at 2.17.33 PM.png'; u.href === 'https://wordpress.org/wp-content/uploads/Screenshot%202025-02-19%20at%202.17.33%E2%80%AFPM.png'
We can see how the browser is implementing the WHATWG URL spec for URL-path-segment string and "convert(ing) to percent-encoded bytes" the "Code points greater than U+007F."
There is some good news here. There have been some recent developments to add spec-compliant URL handling into PHP itself, and many of WordPress’ needs were incorporated into that work. For now we will start with a URL parser, but hopefully in the coming releases we will see added functionality for building URLs (because currently and previous to this new work, PHP has never had tools in its standard library to properly handle URLs).
This is its own bug and we could fix it today.
Filenames could be more conservatively renamed to avoid problems in buggy code elsewhere.
That being said, like most things I’ve encountered in WordPress where text or HTML encoding comes into play, there are likely a number of places that do not properly handle these valid filenames or URLs.
One thing I like about the wording in some of the linked tickets is the intentionality of the transformations. Something like shareable_filename() could communicate the intent to produce a filename which is more resilient to systems which apply their own non-standard rules. Each transformation inside that function can be linked to a particular known system or rule of thumb…
- Facebook doesn’t allow…
- AppEngine rejects control points…
- Many HTML systems will trip up on question mark or hash sign/pound/octothorpe
Maybe I’ll doodle on some of this, but I think it would be helpful for us to consider both parts of why this is broken, as one is uncontroversially broken and the fix is clear (produce spec-compliant URLs to avoid sending browsers the wrong URL) while the other is more subjective and includes WordPress extending known specifications with its own rules.
#11
in reply to:
↑ 10
@
7 months ago
Replying to dmsnell:
For obvious reasons, there should never be a question mark in filenames in the Media Library.
As a side mention, I would pose that it’s not obvious that questions marks should be disallowed, or any character other than the ASCII “/”, since that is used across various filesystems as the path separator.
HTML has always supported URLs with arbitrary characters in them, so I think there are two issues at play.
I would respectfully disagree with this, since like the /, the ? also plays a specific function in URLs. You can't have a question mark in a filename in a URL unless you escape it. Better just to not support it at all, I think.
#12
follow-up:
↓ 13
@
7 months ago
Briefly confirmed:
sanitize_file_name()does in fact already remove the?character from a filename.- The U+202F is replaced on save in the database for @room34 because the table’s collation does not have a way to represent that character and thus uses the replacement character
?instead.- This particular ticket is highlighting a symptom of a much broader problem, which is that WordPress is agnostic to text encodings. (see #62172)
- This is not solved via
blog_charsetand solutions interacting withblog_charsetrisk introducing bigger issues
- Many places perform HTML-escaping on the URL but do not attempt to perform URL-escaping on the URL.
It’s possible we could circumvent all sorts of issues simply by creating proper URLs for these attachments. The filenames are not causing the problems, and if we percent-escaped non-ASCII characters in a URL then we should avoid all sorts of issues for database tables with non-UTF-8 collations as well, because all more-or-less supported character encodings for WordPress are ASCII-compatible (with exceptions, of course).
In fact if we performed this computation we could even relax the transformation on the filename since we only need to try and avoid problems when sharing the files with other services which apply their own restrictions (see the second issue in my comment above).
sanitize_file_name() currently replaces a percent sign with a dash, but I don’t know if any system has problems with filenames including a percent sign. perhaps we could drop all of the complex logic in there and use something like this instead?
<?php function url_encoded_filename( $raw_filename_bytes ) { $encoded = ''; $length = strlen( $raw_filename_bytes ); for ( $i = 0; $i < $length; $i++ ) { $allowable_length = strspn( $raw_filename_bytes, '-0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz~', $i ); if ( $allowable_length > 0 ) { $encoded .= substr( $raw_filename_bytes, $i, $allowable_length ); $i += $allowable_length; continue; } $hex = dechex( $b ); $encoded .= "%{$hex}"; } return $encoded; }
of course this is just some doodling. a real proposal should at a minimum adhere more closely to the WHATWG URL spec on what needs to be percent-encoded. but still, the output of this function will be:
- ASCII safe
- fully visible (no hidden characters)
- URL safe (though the URLs, if properly encoded, will double-escape the
%signs)
more thought is warranted here but I hope that this communicates we may have an opportunity to use a comprehensive solution that eliminates the problem altogether and doesn’t have to rely on repeated bug reports and individual appearances of failures from the same root problem.
#13
in reply to:
↑ 12
@
7 months ago
Replying to dmsnell:
- URL safe (though the URLs, if properly encoded, will double-escape the
%signs)
Wouldn't that end up mangling all non-ASCII characters in filenames (and a number of ASCII characters too), to the point of unrecognizability?
For example, suppose I have a file named alpha-beta-gamma-αβγ.png.
Currently, if I upload this to WordPress, the filename is stored unchanged. The URL would be something like https://example.test/wp-content/uploads/2025/06/alpha-beta-gamma-αβγ.png.
If I understand your proposal correctly, the filename would become alpha-beta-gamma-%CE%B1%CE%B2%CE%B3.png? And the URL would be https://example.test/wp-content/uploads/2025/06/alpha-beta-gamma-%25CE%25B1%25CE%25B2%25CE%25B3.png?
Another example: consider a file named foo bar baz.png (with spaces).
Currently, if I upload this to WordPress, the filename becomes foo-bar-baz.png. The URL becomes https://example.test/wp-content/uploads/2025/06/foo-bar-baz.png.
Under your proposal, the filename would be foo%20bar%20baz.png? And the URL would be https://example.test/wp-content/uploads/2025/06/foo%2520bar%2520baz.png?
#14
@
7 months ago
That’s right, @siliconforks.
I’d like to know if what happened to @matt in his report is the same thing that happened to @room34 — the database table collation doesn’t support the characters.
The URL would be something like
This is not always going to be the case, as highlighted in this ticket. If you attempt to save that and the database table is set to a limited character set such as one in the ISO-8859 family (latin1 for example), then the postmeta value will be transcoded using the replacement character for unsupported characters. Here your URL would end up like this…
alpha-beta-gamma-???.png
This is true regardless of blog_charset, which I try to stress to everyone because I think its inception and eventual use ended up conflicting.
Since the ultimate problem is that WordPress isn’t properly generating the URLs and that’s split among a thousand different places (proverbially speaking, I have not searched or counted), perhaps we could compromise and specifically percent-encode only the URL-relevant characters.
A bigger issue will exist that I think isn’t called out here, that if we don’t go all the way so to speak, we may introduce problems with wp_unique_filename() because what that function returns isn’t ultimately what we store in the database. If we want to resolve it, we need to encode the filename.
We could always decode when providing download links so that the process reverses, but hopefully the specific name of the file matters less and the encoding would be fine as is. WordPress already renames the file for a few related reasons.
This ticket was mentioned in PR #9103 on WordPress/wordpress-develop by @dmsnell.
7 months ago
#15
- Keywords has-patch added
Trac ticket: Core-62995
The sanitize_file_name() function normalizes the no-break space to a normal space (U+0020) in order to prevent issues saving files with the no-break space in it.
This patch expands the replacement to all space characters, since it’s known that macOS stores a NARROW NO-BREAK SPACE (U+202F) in screenshot filenames between the time and the am/pm indicator.
There are deeper issues with the way this function works, but this patch resolves a known and common problem without raising any of the deeper refactoring questions.
#16
@
7 months ago
In the linked patch #9103 I have proposed a minor update to replace all Unicode space characters with the normal space (U+0020)
This circumvents the deeper issue but resolves the common case of macOS screenshots uploading with a ?.
7 months ago
#17
@dmsnell Would you be able to add some unit tests for this? Changing anything related to formatting without tests makes me pretty nervous.
7 months ago
#18
Concerning tests I will attempt to add a few but I started a meetup yesterday and don’t have guarantees on my timing.
I welcome anyone wanting to submit some test cases — since this is a PR, feel free to fork my fork and propose a PR whose target branch is this one.
git remote add dmsnell git@github.com:dmsnell/wordpress-develop.git git fetch dmsnell git checkout my-tests git reset --hard dmsnell/filenames/avoid-collation-corruption # make edits git push -u origin my-tests
This will create a PR against this branch in my repo, which I can merge into this branch/PR.
7 months ago
#19
@desrosj what do you think? On this one I would like to have agreement of at least two separate committers.
@jonsurrell commented on PR #9103:
7 months ago
#20
This does include tests now:
phpunit --testdox --group=16330
Difference with trunk (+3 assertions):
-OK (1 test, 4 assertions) +OK (1 test, 7 assertions)
If wp-includes/formatting.php is reverted to trunk's version, the tests fail:
_Formatting_Sanitize File Name ✘ Replaces spaces ┐ ├ Failed asserting that two strings are identical. ┊ ---·Expected ┊ +++·Actual ┊ @@ @@ ┊ -'Screenshot-2025-02-19-at-2.17.33-PM.png' ┊ +'Screenshot-2025-02-19-at-2.17.33 PM.png' │ ╵ …/tests/phpunit/tests/formatting/sanitizeFileName.php:52 ┴
#23
@
7 months ago
- Owner set to dmsnell
- Resolution set to fixed
- Status changed from new to closed
In 60399:
#24
@
7 months ago
Given the limited scope of this changeset and especially if it fixes known issues with recent OSX default filenames, I think we may consider backporting it, for an earlier release. What do you think?
#25
@
7 months ago
I agree on backporting for 6.8.2. Especially without a known timeline for 6.9, I'd like to see this fixed sooner rather than later.
#26
@
7 months ago
Ok to on 6.8.2 for me as well given the potential widespread of the issue and the simplicity of the patch.
Screenshot showing filename with question mark in Media Library