WordPress.org

Make WordPress Core

Opened 5 months ago

Last modified 4 months ago

#26094 new defect (bug)

sanitize_file_name() breaks some UTF-8 strings

Reported by: p_enrique Owned by:
Milestone: Future Release Priority: normal
Severity: normal Version: 2.1
Component: Formatting Keywords: has-patch commit 3.9-early
Focuses: Cc:

Description

I've been testing sanitize_file_name( 'X.jpg' ) where X is an Unicode character that is a number or a letter (matching regex /[\p{L}\p{N}]/u). Alarmingly, there are many rather common characters that will result in a malformed, broken string being returned:

(U+00E0) : à Latin small letter a with grave
(U+0160) : Š Latin capital letter s with caron
(U+03A0) : Π Greek capital letter pi
(U+0420) : Р Cyrillic capital letter er

The problem seems to be caused by the preg_replace function without a Unicode pattern modifier.

Attachments (3)

26094.patch (1.1 KB) - added by p_enrique 5 months ago.
Add /u pattern modifier to preg_replace
26094.2.patch (1.1 KB) - added by p_enrique 5 months ago.
Add /u pattern modifier to preg_replace with error suppression
26094.3.patch (602 bytes) - added by p_enrique 4 months ago.
use '/[\r\n\t -]+/' instead of '/[\s-]+/'

Download all attachments as: .zip

Change History (9)

p_enrique5 months ago

Add /u pattern modifier to preg_replace

comment:1 p_enrique5 months ago

  • Keywords has-patch added
  • Version set to trunk

comment:3 SergeyBiryukov5 months ago

I think \p{Z} might cause a warning if PCRE was compiled without Unicode property support:

preg_match() [function.preg-match]: Compilation failed: PCRE does not support \L, \l, \N, \P, \p, \U, \u, or \X.

#22692 has a related discussion: ticket:22692:39.

p_enrique5 months ago

Add /u pattern modifier to preg_replace with error suppression

p_enrique4 months ago

use '/[\r\n\t -]+/' instead of '/[\s-]+/'

comment:4 p_enrique4 months ago

  • Cc heikki@… added
  • Keywords dev-feedback added

The latest patch has the same solution as in #24001: Avoid the '\s' in the regexp.

Version 0, edited 4 months ago by p_enrique (next)

comment:5 SergeyBiryukov4 months ago

  • Keywords commit 3.9-early added; dev-feedback removed
  • Milestone changed from Awaiting Review to Future Release
  • Version changed from trunk to 2.7

Introduced in [4710].

comment:6 SergeyBiryukov4 months ago

  • Version changed from 2.7 to 2.1
Note: See TracTickets for help on using tickets.