WordPress.org

Make WordPress Core

Opened 3 months ago

Last modified 3 months ago

#47763 new defect (bug)

Uploaded files that meet certain conditions do not hit in media search

Reported by: dxd5001 Owned by:
Milestone: Awaiting Review Priority: normal
Severity: normal Version: 5.2.2
Component: Media Keywords:
Focuses: administration Cc:
PR Number:

Description

I upload a media file, but it does not appear on the media page when searching the file by the filename.

In my observation, it only happens when this condition has gathered:

  1. The file created with a macOS X
  2. The file named in Japanese like “ワードプレス.pdf”
  3. The name included a sonant mark and/or P‐sound consonant mark
  4. Uploaded from a web browser except for Safari

I think this is caused by Unicode Normalization and APFS and/or HFS+ (both are a file system of macOS X).

The file system uses Normalization Form D (decomposition) for naming, but when I type the name in search window from a browser except Safari, it behaves as Normalization Form C (composition), so these characters don't match.

プ - The character with P‐sound consonant mark added in a filename

Unicode: U+30D5 U+309A, UTF-8: E3 83 95 E3 82 9A

プ - The character with P‐sound consonant mark typed in the search window from a browser (Chrome)

KATAKANA LETTER PU
Unicode: U+30D7, UTF-8: E3 83 97

These characters look the same but not the same.
You can check easily by copy & paste above characters to macOS character viewer.
Right-click the character and copy to get detail information.

Fortunately, there is a normalizer class in PHP (https://www.php.net/manual/en/class.normalizer.php).

So I tried using this class in the function wp_unique_filename(wp-includes/functions.php) and the results are good.

I added this code in wp-includes/functions.php line 2257:

<?php
        // Unicode Normalization: Normalize Form D (decomposition) to Form C (composition).
        if ( Normalizer::isNormalized( $filename, Normalizer::FORM_D ) ) {
                $filename = Normalizer::normalize( $filename, Normalizer::FORM_C );
        }

The file appears in search results on the media page. And also a page that file attached to the content area will hit by text search from the front-end search box.

Although we can deal with this problem using “wp_unique_filename” filter and above class, I think it’s better to handle it in the core file.


Test Environment:

WordPress 5.2.2
PHP 7.2.17
MySQL 5.7.16
macOS X 10.14.5 (MacBook Air)
File system: Apple File System (APFS)
Chrome 75.0.3770.142
Safari 12.1.1 (14607.2.6.1.1)
Firefox 68.0.1

Attachments (2)

functions.php (214.5 KB) - added by dxd5001 3 months ago.
Added Unicode Normalization code to wp_unique_filename function.
ワードプレス.pdf (12.8 KB) - added by dxd5001 3 months ago.
A sample of a media file.

Download all attachments as: .zip

Change History (4)

@dxd5001
3 months ago

Added Unicode Normalization code to wp_unique_filename function.

@dxd5001
3 months ago

A sample of a media file.

Note: See TracTickets for help on using tickets.