WordPress.org

Make WordPress Core

Opened 14 years ago

Last modified 8 years ago

#4739 closed defect (bug)

Some icelandic/Norwegian/Danish letters do not work in page slugs — at Version 13

Reported by: einare Owned by: westi
Milestone: Priority: high
Severity: major Version: 2.2.1
Component: Permalinks Keywords: early 2nd-opinion dev-feedback has-patch
Focuses: Cc:

Description (last modified by westi)

When the page slug is generated from the post title, three icelandic letters are not converted correctly. These three letters are Ð ð, Þ þ and Æ æ. They should be converted to D d, TH th and AE ae but are not.

For instance, when I made a post with the title ‘Þátturinn’ the post-slug would become ‘þatturinn’ and when I tried to enter that address in my address bar it changed to ‘%c3%beatturinn’ and I got a ‘page not found’ error from Wordpress.

This can be fixed by adding the following six lines to formatting.txt, in the function remove_accents, inside the if (seems_utf8($string)) { condition.

chr(195).chr(144) => 'D', 
chr(195).chr(176) => 'd',
chr(195).chr(158) => 'TH',
chr(195).chr(190) => 'th',
chr(195).chr(134) => 'AE',
chr(195).chr(166) => 'ae',

Also (from #5952)
When the post slug is generated from the post title, the letter 'Å' 'å' converts to 'a', should convert to 'aa' which is the general practice in countries using this character (Confer Wikipedia).

Furthermore, the Norwegian/Danish characters 'Æ' 'æ' and 'Ø' 'ø' should be converted to respectively 'ae' and 'oe'. As of now, these convert to '%c3%a6' and '%c3%b8'.

Change History (14)

@einare
14 years ago

Fix for the ticket

#1 @Nazgul
14 years ago

  • Keywords has-patch added
  • Milestone changed from 2.2.3 to 2.3 (trunk)

#2 @westi
14 years ago

  • Keywords dev-reviewed added
  • Owner changed from anonymous to westi
  • Status changed from new to assigned

+1

#3 @westi
14 years ago

  • Resolution set to fixed
  • Status changed from assigned to closed

(In [5969]) Add utf8->ascii mappings for icelandic letters. Fixes #4739 props einare

#4 @nbachiyski
14 years ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

This commit breaks permalinks of posts, containing these characters and posted using the old version of this function.

We should either revert it or pass all permalinks, which aren't manually edited, through the new sanitize title. IN order to achieve this we have to compare the output of the old and the new remove_accents functions.

#5 @nbachiyski
14 years ago

Or maybe we should change the query post name matching, so that it uses the raw post name from the url, not the decoded one. If we don't do this we should be very careful in modifying sanitize_title's behaviour.

#6 @Nazgul
14 years ago

  • Keywords developer-feedback added; has-patch dev-reviewed removed
  • Priority changed from normal to high
  • Severity changed from minor to major

#7 @ryan
14 years ago

Affected posts can be fixed by resaving them. The old slug redirector will handle redirecting the old URL. But, that's not very friendly. For 2.3 we should probably revert the change.

#8 @ryan
14 years ago

(In [6150]) Revert [5969]. It can break permalinks. see #4739

#9 @ryan
14 years ago

  • Milestone changed from 2.3 to 2.4

Reverted for 2.3. We'll try to fix it properly for 2.4.

#10 @westi
14 years ago

  • Keywords needs=patch early added; developer-feedback removed

I guess we need to make sure that any changes we make to the slug generation code they don't affect old posts in the way it currently does.

We should always be checking against the string we use to generate the permalink not a re-santized one.

#11 @nbachiyski
14 years ago

westi, we aren't always generating the permalink based on information we have in the database. Usually the title is used, but users are allowed to enter their own slugs and we don't keep the original slug -- only the sanitized one.

#12 @mdawaffe
14 years ago

  • Keywords needs-patch added; needs=patch removed

#13 @westi
14 years ago

  • Description modified (diff)
  • Milestone changed from 2.5 to 2.6
  • Summary changed from Some icelandic letters do not work in page slugs to Some icelandic/Norwegian/Danish letters do not work in page slugs

Closed #5952 as a dupe of this and updated bug with more characters to fix.

Moving to 2.6 as this needs fixing early and lots of testing so we can be sure we don't break things.

Note: See TracTickets for help on using tickets.