#35951 closed defect (bug) (fixed)
remove_accents() doesn't escape Unicode NFD characters
Reported by: | onnimonni | Owned by: | audrasjb |
---|---|---|---|
Milestone: | 6.1 | Priority: | normal |
Severity: | normal | Version: | |
Component: | Charset | Keywords: | has-patch add-to-field-guide |
Focuses: | Cc: |
Description
OS X filesystem HFS uses unicode NFD instead of NFC. This causes all sorts of problems when uploaded files with accents are moved between environments or if the site is developed in OS X machine and then pushed to production.
I'm trying to solve this problem using remove_accents() function and sanitizing all uploaded files. But in my test machine remove_accents()
doesn't do anything for NFD characters.
It should use something like Normalizer::normalize()
to avoid this. But sadly Normalizer isn't available in all systems.
I included zip file which contains nfd characters. If you open it in linux machine you can see a small difference between the characters and "normal" utf-8 accented characters like: öäå.
Try to copy the contents and run it through remove_accents('content')
and you can see that nothing is changed.
If you have Normalizer available you can test that remove_accent()
if characters are first filtered by running Normalizer for example: remove_accents(Normalizer::normalize('content'))
I realize this doesn't concern native english speaking countries but it's really big annoyance for the rest of us.
Attachments (2)
Change History (24)
#1
follow-up:
↓ 4
@
9 years ago
- Focuses accessibility removed
- Keywords reporter-feedback added
- Version 4.4.2 deleted
Out of interest, how are you interacting with an HFS filesystem? Are you copying files from a volume that uses HFS? AFAIK, you cannot run OS X on HFS.
I would hazard a guess that this is a limited edge case (not saying it's not a bug, just that it's likely a very uncommon situation to find yourself in).
#4
in reply to:
↑ 1
@
8 years ago
Replying to johnbillion:
Out of interest, how are you interacting with an HFS filesystem? Are you copying files from a volume that uses HFS? AFAIK, you cannot run OS X on HFS.
I would hazard a guess that this is a limited edge case (not saying it's not a bug, just that it's likely a very uncommon situation to find yourself in).
The problem with NFD is true for HFS and HFS+, see:
https://en.wikipedia.org/wiki/HFS_Plus#Design
And HFS+ is the default filesystem for all Mac OS X computers.
This patch would fix it, as it adds the mentioned normalizer
function from PHP (if it exists) to the sanitize_filename
filter:
https://core.trac.wordpress.org/attachment/ticket/30130/30130.2.diff
#5
@
8 years ago
- Keywords needs-patch added; reporter-feedback removed
- Milestone changed from Awaiting Review to 4.7
- Owner set to johnbillion
- Status changed from new to assigned
#6
@
8 years ago
- Keywords needs-unit-tests added
In addition to a patch, this will need some unit tests if it is going to be fixed in 4.7
This ticket was mentioned in Slack in #core by jeffpaul. View the logs.
8 years ago
#12
@
6 years ago
- Status changed from assigned to reopened
Like all the other tickets mentioned in this ticket this is still true and needs to be fixed.
I've submitted a talk for WordCamp Europe where I will try to summarize all these tickets, possible patches and why this still is a problem for most if not all European languages.
#15
@
3 years ago
Well, we are at 5.7.2 and it seems that this issue haven't been solved yet.
Either the solution proposed here or the solution proposed at https://core.trac.wordpress.org/ticket/24661 seems to work.
How can I help move this issue forward?
@
3 years ago
Patch to solve expanded characters not being substituted using PHP's own normalizer_normalize function.
#16
@
2 years ago
- Keywords has-patch added; needs-patch needs-unit-tests removed
- Milestone changed from Future Release to 6.1
Milestone to 6.1 as the PR on https://core.trac.wordpress.org/ticket/24661 will (most likely) fix this too.
Please test it!
#17
@
2 years ago
- Owner set to audrasjb
- Resolution set to fixed
- Status changed from assigned to closed
In 53754:
#19
follow-up:
↓ 20
@
2 years ago
Hi, I just wanted to leave a comment here that it seems some people are having problems with not having Normalizer
class available in their environment, which is problematic for some plugins that call remove_accents
more often (like WooCommerce, ref issue https://github.com/woocommerce/woocommerce/issues/35471).
Besides, there are some core workflows that may end up with a fatal error as well. E.g. when the admin tries to create a new user with accented characters, sanitize_user
calls remove_accents
and if they don't have Normalizer
, it will throw a fatal.
I suppose it should be possible to fail more gracefully, or only use Normalizer
if get_loaded_extensions
includes intl
?
#20
in reply to:
↑ 19
@
2 years ago
Replying to peterfabian1000:
I suppose it should be possible to fail more gracefully, or only use
Normalizer
ifget_loaded_extensions
includesintl
?
There is a function exist check there:
https://github.com/WordPress/WordPress/blob/823517e1deabef111a8a1b49bce466eada1162c9/wp-includes/formatting.php#L1604
<?php if ( function_exists( 'normalizer_normalize' ) ) {
If this check is true
but the function call produces a fatal error, we maybe need to switch to check with get_loaded_extensions
and intl
instead.
I would be interested in the environment (which PHP version and how does the check can be true without a working normalize function).
#22
in reply to:
↑ 21
;
follow-up:
↓ 23
@
2 years ago
Replying to peterfabian1000:
I agree it's weird. Maybe some polyfill is adding the function?
The reporter has Updraft Plus installed and this is indeed using a polyfill.
See: https://wpdirectory.net/search/01GGYGZXMMWWG4FDD3781EKW0W
Could the polyfill be not accessible and still producing a true
on a function exists check!?
#23
in reply to:
↑ 22
@
2 years ago
Replying to zodiac1978:
The reporter has Updraft Plus installed and this is indeed using a polyfill.
See: https://wpdirectory.net/search/01GGYGZXMMWWG4FDD3781EKW0W
Could the polyfill be not accessible and still producing a
true
on a function exists check!?
It appears that the plugin polyfills the normalizer_normalize()
function, but the Normalizer
class has a different namespace there, p\Normalizer
, so Normalizer::FORM_C
produces a fatal error.
Since this ticket was closed on a completed milestone, I've created a new one for 6.1.1: #56980, let's continue the discussion there.
ZIP which contains txt file with NFD encoded characters