Opened 11 years ago
Closed 10 years ago
#26094 closed defect (bug) (fixed)
sanitize_file_name() breaks some UTF-8 strings
Reported by: |
|
Owned by: |
|
---|---|---|---|
Milestone: | 4.1 | Priority: | normal |
Severity: | normal | Version: | 2.1 |
Component: | Formatting | Keywords: | has-patch commit 4.1-early |
Focuses: | Cc: |
Description
I've been testing sanitize_file_name( 'X.jpg' ) where X is an Unicode character that is a number or a letter (matching regex /[\p{L}\p{N}]/u
). Alarmingly, there are many rather common characters that will result in a malformed, broken string being returned:
(U+00E0) : à Latin small letter a with grave (U+0160) : Š Latin capital letter s with caron (U+03A0) : Π Greek capital letter pi (U+0420) : Р Cyrillic capital letter er
The problem seems to be caused by the preg_replace
function without a Unicode pattern modifier.
Attachments (3)
Change History (13)
#3
@
11 years ago
I think \p{Z}
might cause a warning if PCRE was compiled without Unicode property support:
preg_match() [function.preg-match]: Compilation failed: PCRE does not support \L, \l, \N, \P, \p, \U, \u, or \X.
#22692 has a related discussion: ticket:22692:39.
#4
@
11 years ago
- Cc heikki@… added
- Keywords dev-feedback added
The latest patch has the same solution as in #24001: Avoid the '\s' in the regexp.
#5
@
11 years ago
- Keywords commit 3.9-early added; dev-feedback removed
- Milestone changed from Awaiting Review to Future Release
- Version changed from trunk to 2.7
Introduced in [4710].
#7
@
10 years ago
Hi.
I just want to remind that the problem in sanitize_file_name for the
(U+03A0) : Π Greek capital letter pi
still exists in wordpress version 3.9.2
It can be solved by replacing the
$filename = preg_replace( '/[\s-]+/' , '-' , $filename ) ;
with
$filename = preg_replace( '/[\s-]+/u' , '-' , $filename ) ;
but I can't be sure what side effects in has for other special cases.
Add /u pattern modifier to preg_replace