Opened 7 years ago
Last modified 5 years ago
#44793 reviewing defect (bug)
remove_accents() doesnt escape all versions of "i"
Reported by: |
|
Owned by: |
|
---|---|---|---|
Milestone: | Future Release | Priority: | normal |
Severity: | normal | Version: | |
Component: | Formatting | Keywords: | has-patch dev-feedback 2nd-opinion |
Focuses: | Cc: |
Description
The version with both dieresis and accent is missing. Suggested addition is the following
plus '΅Ι' => 'I', 'ΐ' => 'i',
Attachments (5)
Change History (26)
#3
@
7 years ago
- Keywords needs-refresh added
- Milestone changed from Awaiting Review to 4.9.9
- Owner set to SergeyBiryukov
- Status changed from new to reviewing
Hi @bagosm, welcome to WordPress Trac! Thanks for the report.
@sebastienserre, thanks for the patch :) Could you please submit a patch without all the formatting changes?
#4
@
7 years ago
Hello @SergeyBiryukov
HUuummm I don't know hos to do :-( I asked to PHPStorm to create a patch after my changes and it gave me this :-(
I will check
#6
@
6 years ago
Uploaded a patch that adds tests for these letters. Also relocated them to a separate category under Greek Coptic letters.
#8
@
6 years ago
It looks like the ticket is about the Greek letter iota with dialytika and tonos: https://en.wiktionary.org/wiki/ΐ
44793.4.diff is an attempt to cover other similar symbols for consistency, including letter upsilon and standalone dialytika and tonos.
@dyrer, @ifrountas, @kosvrouvas, could you take a look at these changes to see if they are valid and remove_accents()
should indeed handle these letters? Are there any other symbols that should be added?
#13
@
6 years ago
- Keywords needs-testing added
Hi,
This ticket is triaged in milestone 5.0.3. Now it will needs commit and backport to the related branch. Adding dev-feedback needs-testing
keywords to see if it can land in 5.0.3 at the very beginning of January.
This ticket was mentioned in Slack in #core by desrosj. View the logs.
6 years ago
#17
@
6 years ago
- Milestone changed from 5.2 to Future Release
Still needs feedback from the Greek translation team (comment:8).
#19
@
6 years ago
Hey there!
I just noticed this ticket as it was turned into a Future Release (good call imho as it might need a lot more discussion as well!). I can handle the Greek letters but I need some clarifications first please because I'm not sure where exactly remove_accents()
is needed & used for. I saw in core it's within sanitize_title()
and sanitize_user()
but it might be on various other places or used in more broader scopes as well that I can't know at the moment.
Please bear with me and let me explain my thinking process and "issues" before creating a patch and if there's an outcome on what's actually needed I'll be more than happy to provide a patch for Greek letters.
The function itself is called "remove_accents()" that literally means removing accents. The description though in our Handbook says Converts all accent characters to ASCII characters.
and this is something totally different, in many languages removing an accent and converting to ASCII (Latin) is whole different story and changes everything.
For example:
Removing accent ( literally so both are Greek letters ): ί = ι
Converting to ASCII ( so altering the locale ): ί = i
Questions:
Where would remove_accents()
be actually used and for what purposes?
As for example if it's used to create "slugs" or titles if we only add accented characters people will end up with a mixed slug or title by having Greek/Latin letters. So most likely in this case a "full" exchange of Greek -> Latin would be needed.
What is actually needed here?
If it's simply changing from an accented Greek letter to a non accented one it would be ok (I believe). If it's to remove accents for another reason and it's used to try and convert everything into Latin maybe we need to split things up into remove_accents()
and convert_to_latin()
for example by introducing something new (if something equivalent doesn't already exist that I'm not aware of).
For me this isn't something as simple as it looks as for various Languages removing accents (if used to alter actual readable text) ends up on changing the meaning of the word as well.
Thanks!
#21
@
5 years ago
- Milestone changed from 5.3 to Future Release
As far as I understand remove_accents()
is meant to replace accented letters from Latin based languages with their non-accented equivalent (as much as possible) only for the Latin-1 Supplement, Latin Extended-A, and Latin Extended-B Unicode blocks.
It is not meant to "reduce" a string containing chars from the above Unicode blocks to an "ASCII equivalent", however in practice this is often the case. Looking back, seems it was added to both sanitize and make permalinks and user names easy to url-encode and look better (SEO related?), although there may have been other reasons.
Looking at the questions @xkon raises, remove_accents()
would probably need better documentation, and in the longer run, a re-evaluation and refactoring.
In that terms moving to "future release" for now. Please feel free to add back to 5.3 if feasible.
Hello @bagosm
I've created a Patch with your modification.
Many thanks to opened this ticket and improve WordPress.