Opened 3 years ago
Closed 20 months ago
#54088 closed defect (bug) (worksforme)
Uploading media containing Norwegian letter å does not automatically readjust it to become aa.
Reported by: | paaljoachim | Owned by: | audrasjb |
---|---|---|---|
Milestone: | Priority: | normal | |
Severity: | normal | Version: | |
Component: | Media | Keywords: | has-patch has-unit-tests dev-feedback |
Focuses: | Cc: |
Description
I did a test yesterday and noticed when uploading an image containing Norwegian letters æ ø å that the å did not convert to aa.
It looked like this:
æ -> ae (converted)
ø -> o (converted)
å -> å (did not convert)
Attachments (4)
Change History (61)
This ticket was mentioned in Slack in #core by paaljoachim. View the logs.
3 years ago
#3
@
3 years ago
I focused on this topic because I am redoing tutorials on my WordPress tutorial site. This is an old tutorial I believe is likely not needed any longer: https://www.easywebdesigntutorials.com/cleaning-up-filenames-that-have-non-utf8-characters-in-them/ (I am adding it in here just in case there are aspects in the tutorial that is needed.) Thanks.
This ticket was mentioned in Slack in #core-media by antpb. View the logs.
3 years ago
#5
@
3 years ago
- Milestone changed from Awaiting Review to 5.9
For anyone digging into this, the solution will likely be within the remove_accents()
function used sanitize_file_name
https://developer.wordpress.org/reference/functions/remove_accents/
#6
follow-up:
↓ 8
@
3 years ago
I need to do some more digging but an initial glance at the logic behind converting å
is only turning it into a
but seemingly not even doing that from the video provided.
#8
in reply to:
↑ 6
@
3 years ago
Replying to antpb:
I need to do some more digging but an initial glance at the logic behind converting
å
is only turning it intoa
Some background:
"å" does not stem from a ligature. The letter stems form Old Norse "á", a longer and darker form of the sound written as "a". Swedish has had it since the 16th century, Norwegian since 1917 and Danish since 1948. Danish still use "aa" in many geographical names (alternative, official spelling), but this is not the not the case in Norway and Sweden (only old family names).
A few years ago there was a suggestion here on Trac to transliterate "å" to "aa" in slugs, instead of just "a" (as initially in WP). There was some opposition to this in Scandinavia, at least in Norway (advocated by me). Generally, but specially in the Norwegian variant Nynorsk, the has been a stronger opposition to use "aa" of "å". This is because in some words the next letter is also an "a", giving "aaa" on words like "Tåa" and "Åa". But also because it doesn't add readability and just becoming longer. So I say, at least as my personal opinion, keep it like that. We are uses to it and don't complain. Keep special for Danish.
The main thing here and now is of course to make it work properly for filenames.
#9
follow-up:
↓ 11
@
3 years ago
Hei Knut. Thank you for adding the additional information!
My name is Paal (American/English spelling would likely be Paul). Same spelling as my father. In modern Norway Paal is spelled Pål. So the alternative to using å is usually aa. If I write Norwegian with an English keyboard I would use the aa instead of å.
I agree the main thing here is making it work properly for filenames.
I would prefer a conversion of å to aa but if there is "a lot" of resistance to aa than a single a would also be totally fine.
#10
@
3 years ago
There's already a test for this but only for the remove_accents()
function, not that it actually applies those transformations to the name of an uploaded file. https://github.com/WordPress/wordpress-develop/blob/16b04903feec8216bdd2e6230f4ad511a9238db1/tests/phpunit/tests/formatting/removeAccents.php#L15
#11
in reply to:
↑ 9
;
follow-up:
↓ 27
@
3 years ago
Replying to paaljoachim:
If I write Norwegian with an English keyboard I would use the aa instead of å.
Good point, and this was mentioned back then. Also international standards on the field. However, writing is slightly different than slugs, as distinguishing between "a" and "å" might feel needed.
So, it was argued from a conservative point of view, don't change what works just fine. That a change was made just for Danish surprised me a bit, but that effectively silenced the discussion.
Small thing. If there is a need on WP for standardization across our relatively small Scandinavian languages, "aa" will be just fine by me.
I have linked to this is in the Norwegian Slack.
#12
@
3 years ago
Oh, no! Please don’t transliterate å to aa, nor ø to oe. Æ is (originally) a ligature, so it’s fine to use ae. Visually, ae is close to an æ, so it’s easy to read. Texts where å is transliterated to aa (or ø to oe) is really hard to read, as it breaks the “look at the full word to recognize and read it” feature in the brain.
It also looks like it was written by Henrik Ibsen 150 years ago. As Paal mentions, in modern Norway Paal is spelled Pål.
Surnames, which are rarely changed/updated, became common in the period where eg. aa was still used. They became mandatory in 1923 when å had recently been introduced to Norwegian, and had yet not been introduced into Danish (which Norwegian was extremely much based on). Over the last 100 years, a lot of family names have been updated to use å, but this is not something that people change lightly, so it’s still common to see them there. First names using aa are rare.
In the WP context this is only done for normalizing slugs and filenames. Using the longer versions makes them … well … longer. As Knut also mentioned, having “aaa” is not exactly ideal.
I see no reason to make the slugs longer and less readable, to confirm to an old and conservative method that is irrelevant and outdated to most people. I tried to find what The Language Council of Norway (Språkrådet) has to say about it, but could not find anything.
If anything gets changed in WP regarding this, we would need a filter on the transliteration table, so people can choose what they like.
#13
follow-up:
↓ 14
@
3 years ago
Hei @bjornjohansen
I do think the most common approach when not able to use Norwegian letters is to use æ = ae, ø = o and å = aa. But I do feel your passion here. Having å become a or aa in a filename does not really matter to me. The important part is actually the process being done, and that the å becomes converted in a filename.
It sounds like you really really really want to instead see å converted to a...:)
That is fine by me..:)
#14
in reply to:
↑ 13
@
3 years ago
Replying to paaljoachim:
It sounds like you really really really want to instead see å converted to a...:)
Haha, yes. I’m a bit passionate about this. It’s personal :-)
BTW, it looks like filenames are keeping æ, ø, and å. So it’s just in the slugs where æ and ø are transliterated, while å isn’t.
#15
@
3 years ago
Hi @paaljoachim
In which wordpress version you are facing this issue? In latest wordpress version it is running fine. But in version 4.9.8 i am facing this issue.
#16
@
3 years ago
Hi @smit08
I just retested.
I noticed that "å" remain "å" in WordPress 5.8.1. Tested with naming an image æøå and only the å was not converted.
Å should become either aa or a instead.
It is fine by me if we change å to a or aa. It is nice to get a fix in place.
This ticket was mentioned in Slack in #core-media by antpb. View the logs.
3 years ago
This ticket was mentioned in Slack in #core-media by joedolson. View the logs.
3 years ago
This ticket was mentioned in Slack in #core by costdev. View the logs.
2 years ago
#21
@
2 years ago
This ticket was discussed in the bug scrub. @paaljoachim, do you think this ticket is likely to move towards resolution during the 6.0 cycle?
#22
@
2 years ago
Thank you for bringing this up @costdev
Let's go with Bjørn's @bjornjohansen passion for converting å to a...:)
We need someone/dev to create a patch converting å to a. It would be nice to get it into WP 6.0.
This ticket was mentioned in Slack in #core-media by antpb. View the logs.
2 years ago
#25
@
2 years ago
I just ran a quick test and this problem seems to affect more letters than only å. As seen in the screenshot above, it also affects ạ and ā. That said, I only tested å, æ, ạ and ā. There might be many more characters affected.
#27
in reply to:
↑ 11
@
2 years ago
Replying to knutsp:
Replying to paaljoachim:
If I write Norwegian with an English keyboard I would use the aa instead of å.
Good point, and this was mentioned back then. Also international standards on the field. However, writing is slightly different than slugs, as distinguishing between "a" and "å" might feel needed.
So, it was argued from a conservative point of view, don't change what works just fine. That a change was made just for Danish surprised me a bit, but that effectively silenced the discussion.
For reference, [26585] / #23907 appears to be the related change.
Replying to johnbillion:
There's already a test for this but only for the
remove_accents()
function, not that it actually applies those transformations to the name of an uploaded file. https://github.com/WordPress/wordpress-develop/blob/16b04903feec8216bdd2e6230f4ad511a9238db1/tests/phpunit/tests/formatting/removeAccents.php#L15
There is a test with some of the mentioned characters for sanitize_file_name()
too, see [48603] / #22363. If that doesn't always work as expected, something else might be involved.
#28
@
2 years ago
- Milestone changed from 6.0 to 6.1
With 6.0 RC1 tomorrow, I'm moving this ticket to the 6.1 milestone.
#29
@
2 years ago
Hi @nielslange
I feel like this ticket got sidetracked with other characters. In Norway for the regular language we in general use æøå. Only the å needed to be converted to either a or aa. It seemed like it was about to be fixed for WP 6.0 with the decision to convert å to a.
Instead of adding in additional characters that are not used in the regular written language of Norway it would have been better to open a new trac ticket with the additional characters.
I am not sure what actually happened in this ticket....
This ticket was mentioned in Slack in #core by paaljoachim. View the logs.
2 years ago
This ticket was mentioned in Slack in #core by costdev. View the logs.
2 years ago
#32
@
2 years ago
- Keywords dev-feedback added
This ticket was discussed in the bug scrub. The characters mentioned in comment 25 should be discussed in their own ticket.
This ticket should continue on the discussion about changing å
to a
. Please be mindful of backwards compatibility as the discussion continues.
I'll also add dev-feedback
to help draw more attention to this ticket.
This ticket was mentioned in PR #2688 on WordPress/wordpress-develop by nielslange.
2 years ago
#33
- Keywords has-patch has-unit-tests added; needs-patch needs-unit-tests removed
Trac ticket: https://core.trac.wordpress.org/ticket/54088
#34
@
2 years ago
@paaljoachim and @costdev As asked by you above, I've only addressed the problem with the Norwegian letter å
. I noticed, that å
can appear with two different Unicode character code points.
<?php // int(97) var_dump( mb_ord( 'å' ) ); // int(229) var_dump( mb_ord( 'å' ) );
I've added the character, that hasn't been converted, to remove_accents
in /wp-includes/formatting.php
and updated the unit test test_remove_accents_latin1_supplement
in /tests/phpunit/tests/formatting/removeAccents.php
.
#36
follow-up:
↓ 40
@
2 years ago
Very happy this gets in. Also happy that "å" still will transliterate to "a" in slugs, at least when using Norwegian locale.
As noted in my #comment:8 this character is not purely Norwegian, but part of the common Danish/Norwegian alphabet. Very common in locales DA_dk, NB_no and nn_NO.
I guess this transliteration of file names will happen independent of locale?
#37
@
2 years ago
I am redoing older tutorials and I am wondering if this tutorial is still valid for various languages?
(Norwegian letters is mostly taken care of as we know.)
https://www.easywebdesigntutorials.com/cleaning-up-filenames-that-have-non-utf8-characters-in-them
This ticket was mentioned in Slack in #core-media by antpb. View the logs.
2 years ago
This ticket was mentioned in Slack in #core-media by joedolson. View the logs.
2 years ago
#41
@
2 years ago
- Owner changed from nielslange to audrasjb
- Status changed from assigned to accepted
I think we're good to go with PR2688. Self assigning for final testing and commit.
#43
@
2 years ago
@nielslange @paaljoachim what is the Code for å
?
We already have this in remove_accents()
:
* | U+00E5 | å | a | Latin small letter a with ring above |
Is it a different character? Which one?
#45
@
2 years ago
Heya @audrasjb
I just made a search for the unicode for å and found the above.
https://unicode-table.com/en/search/?q=%C3%A5
#46
@
2 years ago
Thanks @paaljoachim, but in that case, it looks like the character is already covered by remove_accents().
See the following links:
#47
@
2 years ago
- Milestone changed from 6.1 to 6.2
With WP 6.1 RC 1 scheduled today (Oct 11, 2022), there is not much time left to address this ticket. Let's move it to the next milestone.
This ticket was mentioned in Slack in #core by costdev. View the logs.
20 months ago
This ticket was mentioned in Slack in #core-media by antpb. View the logs.
20 months ago
#50
@
20 months ago
@paaljoachim Can you re-test this and confirm whether this issue is still happening? I just tested against trunk and 6.1.1, and I can't reproduce the issue with å
failing to change to a
in uploaded media file names.
I'm wondering whether this issue was specific to your environment or was fixed inadvertently through some other commit since September 2021.
Regarding the choice of letter to change to: it seems to be somewhat contentious, so in the interest of not churning code unnecessarily, I feel that we should leave it as it is.
#51
@
20 months ago
I'm able to reproduce locally on trunk 6.2-beta3-55400-src.
Media handling config (summary of site health report):
- Active editor: WP_Image_Editor_GD
- GD version: 2.3.3
- GD supported file formats: GIF, JPEG, PNG, WebP, BMP, XPM
- Ghostscript, ImageMagick: not available
Media Page
- Visit Admin > Media
- Drag image named
håmilton.jpg
in to window - Image remains
håmilton.jpg
and the resized images also use the letterå
.
Block Editor
- Create new post
- Drag image name
håmilton.jpg
in to block editor - New block created, saves image and the resized images with the letter
å
Block editor two
- Create new post
- Add image block.
- Click media library
- Drag image in to media libray, select once uploaded
- New block complete, saves image and the resized images with the letter
å
It seems the issue in the initial report is the inconsistency with how WordPress handles non-ascii characters. æ
and ø
are converted to ascii while å
is not. I understand why this is not optimal.
What I am wondering is, is there any technical limitation introduced by retaining the letter å
in the file name? Specifically, is it common that browsers or servers are unable to load the image as a result?
Please excuse any ignorance as I am a monolingual English speaker.
#52
@
20 months ago
Retesting using WordPress 6.2 beta 2.
Twenty Twenty Three.
Brave browser.
I dragged the following images into the Media library.
I uploaded: ståck-of-pebbles-gråphic.jpg it was converted into stack-of-pebbles-graphic.jpg
I uploaded: Sun-grådients-ååå.jpg it was converted into Sun-gradients-aaa.jpg
I uploaded: Moon-stars-åå.jpg it was converted into Moon-stars-aa.jpg
I uploaded: Leaves-white-å.jpg it was converted into Leaves-white-a.jpg.
I also tested dragging a couple of the images into the Block Editor and these were also converted. So as far as I noticed the conversation is working as it should.
This ticket was mentioned in Slack in #core by mukeshpanchal27. View the logs.
20 months ago
#54
@
20 months ago
I tested with WordPress 6.2 beta 3, without applying the path, with a file named håmilton.png
and it is converted to hamilton.png
.
If I'm not misunderstanding, this is the expected result, isn't it?
#55
@
20 months ago
"If I'm not misunderstanding, this is the expected result, isn't it?"
The å -> a.
As I understand it. It is the expected result.
Norwegian letter å does not convert to aa