#4739 closed defect (bug) (duplicate)
Some icelandic/Norwegian/Danish letters do not work in page slugs
Reported by: | einare | Owned by: | westi |
---|---|---|---|
Milestone: | Priority: | high | |
Severity: | major | Version: | 2.2.1 |
Component: | Permalinks | Keywords: | early 2nd-opinion dev-feedback has-patch |
Focuses: | Cc: |
Description (last modified by )
When the page slug is generated from the post title, three icelandic letters are not converted correctly. These three letters are Ð ð, Þ þ and Æ æ. They should be converted to D d, TH th and AE ae but are not.
For instance, when I made a post with the title ‘Þátturinn’ the post-slug would become ‘þatturinn’ and when I tried to enter that address in my address bar it changed to ‘%c3%beatturinn’ and I got a ‘page not found’ error from Wordpress.
This can be fixed by adding the following six lines to formatting.txt, in the function remove_accents, inside the if (seems_utf8($string)) { condition.
chr(195).chr(144) => 'D', chr(195).chr(176) => 'd', chr(195).chr(158) => 'TH', chr(195).chr(190) => 'th', chr(195).chr(134) => 'AE', chr(195).chr(166) => 'ae',
Also (from #5952)
When the post slug is generated from the post title, the letter 'Å' 'å' converts to 'a', should convert to 'aa' which is the general practice in countries using this character (Confer Wikipedia).
Furthermore, the Norwegian/Danish characters 'Æ' 'æ' and 'Ø' 'ø' should be converted to respectively 'ae' and 'oe'. As of now, these convert to '%c3%a6' and '%c3%b8'.
Attachments (2)
Change History (29)
#2
@
17 years ago
- Keywords dev-reviewed added
- Owner changed from anonymous to westi
- Status changed from new to assigned
+1
#4
@
17 years ago
- Resolution fixed deleted
- Status changed from closed to reopened
This commit breaks permalinks of posts, containing these characters and posted using the old version of this function.
We should either revert it or pass all permalinks, which aren't manually edited, through the new sanitize title. IN order to achieve this we have to compare the output of the old and the new remove_accents functions.
#5
@
17 years ago
Or maybe we should change the query post name matching, so that it uses the raw post name from the url, not the decoded one. If we don't do this we should be very careful in modifying sanitize_title
's behaviour.
#6
@
17 years ago
- Keywords developer-feedback added; has-patch dev-reviewed removed
- Priority changed from normal to high
- Severity changed from minor to major
#7
@
17 years ago
Affected posts can be fixed by resaving them. The old slug redirector will handle redirecting the old URL. But, that's not very friendly. For 2.3 we should probably revert the change.
#9
@
17 years ago
- Milestone changed from 2.3 to 2.4
Reverted for 2.3. We'll try to fix it properly for 2.4.
#10
@
17 years ago
- Keywords needs=patch early added; developer-feedback removed
I guess we need to make sure that any changes we make to the slug generation code they don't affect old posts in the way it currently does.
We should always be checking against the string we use to generate the permalink not a re-santized one.
#11
@
17 years ago
westi, we aren't always generating the permalink based on information we have in the database. Usually the title is used, but users are allowed to enter their own slugs and we don't keep the original slug -- only the sanitized one.
#13
@
17 years ago
- Description modified (diff)
- Milestone changed from 2.5 to 2.6
- Summary changed from Some icelandic letters do not work in page slugs to Some icelandic/Norwegian/Danish letters do not work in page slugs
Closed #5952 as a dupe of this and updated bug with more characters to fix.
Moving to 2.6 as this needs fixing early and lots of testing so we can be sure we don't break things.
#17
@
16 years ago
This problem seems to be something quite easy to fix. If I understand correctly you only have to add a few lines to formatting.txt.
Why has this then not already been fixed?
#18
@
16 years ago
I had similar problems (a Page titled "Bøger" (books) screwed up the permalink (slug?), and after reading this, I created (wow) a few more lines fixing it for Å/å and Ø/ø.
With the fix in the top, and these lines, my problems with æ/ø/å is done.
chr(195).chr(133) => 'Aa', chr(195).chr(165) => 'aa', chr(195).chr(152) => 'Oe', chr(195).chr(184) => 'oe',
Please incorporate it in the official version :)
/svendk
#19
@
16 years ago
- Cc janbrasna added
- Component changed from i18n to Permalinks
- Keywords 2nd-opinion dev-feedback added
- Milestone changed from 2.9 to 2.8
It used to work in non–UTF processing at some point in the past (see http://trac.wordpress.org/browser/trunk/wp-includes/formatting.php?rev=10150#L401 for the old Latin 1 transliteration code) but was apparently omitted when the UTF transliteration segment was written.
Anyway the main problem is the sanitize_title
line
http://trac.wordpress.org/browser/trunk/wp-includes/query.php?rev=10150#L1671
that makes it effectively impossible to change the transliteration array at any point. I can't seem to find the point in comment:11 because the post_name field in the DB is always matched and regarding comment:5 I think the sanitize_title
at that point is maybe overboard? Wouldn't just escaping it for SQL be enough? Otherwise, comment:7 sounds fine for that matter.
#20
@
16 years ago
- Resolution set to duplicate
- Status changed from reopened to closed
Merging into #9591.
#23
@
12 years ago
- Cc wordpress-thomas@… added
- Resolution fixed deleted
- Status changed from closed to reopened
We are still transcribing the following ligatures wrong: ÆØÅøå. They are used in scandinavian languages, and should be transcribed as 'Ae', 'Oe', 'Aa', 'oe' and 'aa' respectively.
#25
follow-up:
↓ 26
@
12 years ago
- Resolution set to duplicate
- Status changed from reopened to closed
Duplicate of #9591.
Oops, wrong ticket.
Fix for the ticket