WordPress.org

Make WordPress Core

Opened 2 years ago

Last modified 19 months ago

#35951 assigned defect (bug)

remove_accents() doesn't escape Unicode NFD characters

Reported by: onnimonni Owned by: johnbillion
Milestone: Future Release Priority: normal
Severity: normal Version:
Component: Charset Keywords: needs-patch needs-unit-tests
Focuses: Cc:

Description

OS X filesystem HFS uses unicode NFD instead of NFC. This causes all sorts of problems when uploaded files with accents are moved between environments or if the site is developed in OS X machine and then pushed to production.

I'm trying to solve this problem using remove_accents() function and sanitizing all uploaded files. But in my test machine remove_accents() doesn't do anything for NFD characters.

It should use something like Normalizer::normalize() to avoid this. But sadly Normalizer isn't available in all systems.

I included zip file which contains nfd characters. If you open it in linux machine you can see a small difference between the characters and "normal" utf-8 accented characters like: öäå.

Try to copy the contents and run it through remove_accents('content') and you can see that nothing is changed.

If you have Normalizer available you can test that remove_accent() if characters are first filtered by running Normalizer for example: remove_accents(Normalizer::normalize('content'))

I realize this doesn't concern native english speaking countries but it's really big annoyance for the rest of us.

Attachments (1)

nfd-characters.zip (199 bytes) - added by onnimonni 2 years ago.
ZIP which contains txt file with NFD encoded characters

Download all attachments as: .zip

Change History (11)

@onnimonni
2 years ago

ZIP which contains txt file with NFD encoded characters

#1 follow-up: @johnbillion
2 years ago

  • Focuses accessibility removed
  • Keywords reporter-feedback added
  • Version 4.4.2 deleted

Out of interest, how are you interacting with an HFS filesystem? Are you copying files from a volume that uses HFS? AFAIK, you cannot run OS X on HFS.

I would hazard a guess that this is a limited edge case (not saying it's not a bug, just that it's likely a very uncommon situation to find yourself in).

#4 in reply to: ↑ 1 @zodiac1978
22 months ago

Replying to johnbillion:

Out of interest, how are you interacting with an HFS filesystem? Are you copying files from a volume that uses HFS? AFAIK, you cannot run OS X on HFS.

I would hazard a guess that this is a limited edge case (not saying it's not a bug, just that it's likely a very uncommon situation to find yourself in).

The problem with NFD is true for HFS and HFS+, see: https://en.wikipedia.org/wiki/HFS_Plus#Design

And HFS+ is the default filesystem for all Mac OS X computers.

This patch would fix it, as it adds the mentioned normalizer function from PHP (if it exists) to the sanitize_filename filter: https://core.trac.wordpress.org/attachment/ticket/30130/30130.2.diff

#5 @johnbillion
21 months ago

  • Keywords needs-patch added; reporter-feedback removed
  • Milestone changed from Awaiting Review to 4.7
  • Owner set to johnbillion
  • Status changed from new to assigned

#6 @jorbin
21 months ago

  • Keywords needs-unit-tests added

In addition to a patch, this will need some unit tests if it is going to be fixed in 4.7

#7 @gitlost
21 months ago

This is a duplicate really of #24661.

This ticket was mentioned in Slack in #core by jeffpaul. View the logs.


20 months ago

#9 @johnbillion
19 months ago

  • Milestone changed from 4.7 to Future Release

#10 @gitlost
19 months ago

I think it would be more apt to mark this as a duplicate of #24661.

Note: See TracTickets for help on using tickets.