WordPress.org

Make WordPress Core

Opened 3 years ago

Last modified 3 months ago

#35951 reopened defect (bug)

remove_accents() doesn't escape Unicode NFD characters

Reported by: onnimonni Owned by: johnbillion
Milestone: Future Release Priority: normal
Severity: normal Version:
Component: Charset Keywords: needs-patch needs-unit-tests
Focuses: Cc:

Description

OS X filesystem HFS uses unicode NFD instead of NFC. This causes all sorts of problems when uploaded files with accents are moved between environments or if the site is developed in OS X machine and then pushed to production.

I'm trying to solve this problem using remove_accents() function and sanitizing all uploaded files. But in my test machine remove_accents() doesn't do anything for NFD characters.

It should use something like Normalizer::normalize() to avoid this. But sadly Normalizer isn't available in all systems.

I included zip file which contains nfd characters. If you open it in linux machine you can see a small difference between the characters and "normal" utf-8 accented characters like: öäå.

Try to copy the contents and run it through remove_accents('content') and you can see that nothing is changed.

If you have Normalizer available you can test that remove_accent() if characters are first filtered by running Normalizer for example: remove_accents(Normalizer::normalize('content'))

I realize this doesn't concern native english speaking countries but it's really big annoyance for the rest of us.

Attachments (1)

nfd-characters.zip (199 bytes) - added by onnimonni 3 years ago.
ZIP which contains txt file with NFD encoded characters

Download all attachments as: .zip

Change History (13)

@onnimonni
3 years ago

ZIP which contains txt file with NFD encoded characters

#1 follow-up: @johnbillion
3 years ago

  • Focuses accessibility removed
  • Keywords reporter-feedback added
  • Version 4.4.2 deleted

Out of interest, how are you interacting with an HFS filesystem? Are you copying files from a volume that uses HFS? AFAIK, you cannot run OS X on HFS.

I would hazard a guess that this is a limited edge case (not saying it's not a bug, just that it's likely a very uncommon situation to find yourself in).

#2 @ocean90
3 years ago

Related: #30130

#4 in reply to: ↑ 1 @zodiac1978
3 years ago

Replying to johnbillion:

Out of interest, how are you interacting with an HFS filesystem? Are you copying files from a volume that uses HFS? AFAIK, you cannot run OS X on HFS.

I would hazard a guess that this is a limited edge case (not saying it's not a bug, just that it's likely a very uncommon situation to find yourself in).

The problem with NFD is true for HFS and HFS+, see:
https://en.wikipedia.org/wiki/HFS_Plus#Design

And HFS+ is the default filesystem for all Mac OS X computers.

This patch would fix it, as it adds the mentioned normalizer function from PHP (if it exists) to the sanitize_filename filter:
https://core.trac.wordpress.org/attachment/ticket/30130/30130.2.diff

#5 @johnbillion
3 years ago

  • Keywords needs-patch added; reporter-feedback removed
  • Milestone changed from Awaiting Review to 4.7
  • Owner set to johnbillion
  • Status changed from new to assigned

#6 @jorbin
3 years ago

  • Keywords needs-unit-tests added

In addition to a patch, this will need some unit tests if it is going to be fixed in 4.7

#7 @gitlost
3 years ago

This is a duplicate really of #24661.

This ticket was mentioned in Slack in #core by jeffpaul. View the logs.


3 years ago

#9 @johnbillion
3 years ago

  • Milestone changed from 4.7 to Future Release

#10 @gitlost
3 years ago

I think it would be more apt to mark this as a duplicate of #24661.

#12 @zodiac1978
7 months ago

  • Status changed from assigned to reopened

Like all the other tickets mentioned in this ticket this is still true and needs to be fixed.

I've submitted a talk for WordCamp Europe where I will try to summarize all these tickets, possible patches and why this still is a problem for most if not all European languages.

#13 @SergeyBiryukov
5 months ago

  • Milestone set to Future Release
Note: See TracTickets for help on using tickets.