Opened 6 years ago
Closed 3 years ago
#47763 closed defect (bug) (fixed)
Uploaded files that meet certain conditions do not hit in media search
Reported by: |
|
Owned by: |
|
---|---|---|---|
Milestone: | 6.1 | Priority: | normal |
Severity: | normal | Version: | 5.2.2 |
Component: | Media | Keywords: | has-patch |
Focuses: | administration | Cc: |
Description
I upload a media file, but it does not appear on the media page when searching the file by the filename.
In my observation, it only happens when this condition has gathered:
- The file created with a macOS X
- The file named in Japanese like “ワードプレス.pdf”
- The name included a sonant mark and/or P‐sound consonant mark
- Uploaded from a web browser except for Safari
I think this is caused by Unicode Normalization and APFS and/or HFS+ (both are a file system of macOS X).
The file system uses Normalization Form D (decomposition) for naming, but when I type the name in search window from a browser except Safari, it behaves as Normalization Form C (composition), so these characters don't match.
プ - The character with P‐sound consonant mark added in a filename
Unicode: U+30D5 U+309A, UTF-8: E3 83 95 E3 82 9A
プ - The character with P‐sound consonant mark typed in the search window from a browser (Chrome)
KATAKANA LETTER PU Unicode: U+30D7, UTF-8: E3 83 97
These characters look the same but not the same.
You can check easily by copy & paste above characters to macOS character viewer.
Right-click the character and copy to get detail information.
Fortunately, there is a normalizer class in PHP (https://www.php.net/manual/en/class.normalizer.php).
So I tried using this class in the function wp_unique_filename(wp-includes/functions.php) and the results are good.
I added this code in wp-includes/functions.php line 2257:
<?php // Unicode Normalization: Normalize Form D (decomposition) to Form C (composition). if ( Normalizer::isNormalized( $filename, Normalizer::FORM_D ) ) { $filename = Normalizer::normalize( $filename, Normalizer::FORM_C ); }
The file appears in search results on the media page. And also a page that file attached to the content area will hit by text search from the front-end search box.
Although we can deal with this problem using “wp_unique_filename” filter and above class, I think it’s better to handle it in the core file.
—
Test Environment:
WordPress 5.2.2
PHP 7.2.17
MySQL 5.7.16
macOS X 10.14.5 (MacBook Air)
File system: Apple File System (APFS)
Chrome 75.0.3770.142
Safari 12.1.1 (14607.2.6.1.1)
Firefox 68.0.1
Attachments (2)
Change History (7)
#3
@
3 years ago
- Milestone changed from Awaiting Review to 6.1
Milestone to 6.1 as the PR on https://core.trac.wordpress.org/ticket/24661 will (most likely) fix this too.
Please test it!
Added Unicode Normalization code to wp_unique_filename function.