Make WordPress Core

Opened 17 years ago

Closed 11 years ago

Last modified 11 years ago

#4739 closed defect (bug) (duplicate)

Some icelandic/Norwegian/Danish letters do not work in page slugs

Reported by: einare's profile einare Owned by: westi's profile westi
Milestone: Priority: high
Severity: major Version: 2.2.1
Component: Permalinks Keywords: early 2nd-opinion dev-feedback has-patch
Focuses: Cc:

Description (last modified by westi)

When the page slug is generated from the post title, three icelandic letters are not converted correctly. These three letters are Ð ð, Þ þ and Æ æ. They should be converted to D d, TH th and AE ae but are not.

For instance, when I made a post with the title ‘Þátturinn’ the post-slug would become ‘þatturinn’ and when I tried to enter that address in my address bar it changed to ‘%c3%beatturinn’ and I got a ‘page not found’ error from Wordpress.

This can be fixed by adding the following six lines to formatting.txt, in the function remove_accents, inside the if (seems_utf8($string)) { condition.

chr(195).chr(144) => 'D', 
chr(195).chr(176) => 'd',
chr(195).chr(158) => 'TH',
chr(195).chr(190) => 'th',
chr(195).chr(134) => 'AE',
chr(195).chr(166) => 'ae',

Also (from #5952)
When the post slug is generated from the post title, the letter 'Å' 'å' converts to 'a', should convert to 'aa' which is the general practice in countries using this character (Confer Wikipedia).

Furthermore, the Norwegian/Danish characters 'Æ' 'æ' and 'Ø' 'ø' should be converted to respectively 'ae' and 'oe'. As of now, these convert to '%c3%a6' and '%c3%b8'.

Attachments (2)

4739.patch (2.8 KB) - added by einare 17 years ago.
Fix for the ticket
4739.ligatures.patch (2.0 KB) - added by dnusim 11 years ago.
Correcting transcribation of scandinavian ligatures.

Download all attachments as: .zip

Change History (29)

@einare
17 years ago

Fix for the ticket

#1 @Nazgul
17 years ago

  • Keywords has-patch added
  • Milestone changed from 2.2.3 to 2.3 (trunk)

#2 @westi
17 years ago

  • Keywords dev-reviewed added
  • Owner changed from anonymous to westi
  • Status changed from new to assigned

+1

#3 @westi
17 years ago

  • Resolution set to fixed
  • Status changed from assigned to closed

(In [5969]) Add utf8->ascii mappings for icelandic letters. Fixes #4739 props einare

#4 @nbachiyski
17 years ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

This commit breaks permalinks of posts, containing these characters and posted using the old version of this function.

We should either revert it or pass all permalinks, which aren't manually edited, through the new sanitize title. IN order to achieve this we have to compare the output of the old and the new remove_accents functions.

#5 @nbachiyski
17 years ago

Or maybe we should change the query post name matching, so that it uses the raw post name from the url, not the decoded one. If we don't do this we should be very careful in modifying sanitize_title's behaviour.

#6 @Nazgul
17 years ago

  • Keywords developer-feedback added; has-patch dev-reviewed removed
  • Priority changed from normal to high
  • Severity changed from minor to major

#7 @ryan
17 years ago

Affected posts can be fixed by resaving them. The old slug redirector will handle redirecting the old URL. But, that's not very friendly. For 2.3 we should probably revert the change.

#8 @ryan
17 years ago

(In [6150]) Revert [5969]. It can break permalinks. see #4739

#9 @ryan
17 years ago

  • Milestone changed from 2.3 to 2.4

Reverted for 2.3. We'll try to fix it properly for 2.4.

#10 @westi
16 years ago

  • Keywords needs=patch early added; developer-feedback removed

I guess we need to make sure that any changes we make to the slug generation code they don't affect old posts in the way it currently does.

We should always be checking against the string we use to generate the permalink not a re-santized one.

#11 @nbachiyski
16 years ago

westi, we aren't always generating the permalink based on information we have in the database. Usually the title is used, but users are allowed to enter their own slugs and we don't keep the original slug -- only the sanitized one.

#12 @mdawaffe
16 years ago

  • Keywords needs-patch added; needs=patch removed

#13 @westi
16 years ago

  • Description modified (diff)
  • Milestone changed from 2.5 to 2.6
  • Summary changed from Some icelandic letters do not work in page slugs to Some icelandic/Norwegian/Danish letters do not work in page slugs

Closed #5952 as a dupe of this and updated bug with more characters to fix.

Moving to 2.6 as this needs fixing early and lots of testing so we can be sure we don't break things.

#14 @dnusim
16 years ago

Is there any way I can help without actually coding?

#15 follow-up: @snakefoot
15 years ago

Duplicate #4273 ?

#16 in reply to: ↑ 15 @westi
15 years ago

Replying to snakefoot:

Duplicate #4273 ?

I think that is a similar issue not sure if it's a dupe though

#17 @shogunn
15 years ago

This problem seems to be something quite easy to fix. If I understand correctly you only have to add a few lines to formatting.txt.

Why has this then not already been fixed?

#18 @svendk
15 years ago

I had similar problems (a Page titled "Bøger" (books) screwed up the permalink (slug?), and after reading this, I created (wow) a few more lines fixing it for Å/å and Ø/ø.
With the fix in the top, and these lines, my problems with æ/ø/å is done.

chr(195).chr(133) => 'Aa',
chr(195).chr(165) => 'aa',
chr(195).chr(152) => 'Oe',
chr(195).chr(184) => 'oe',

Please incorporate it in the official version :)

/svendk

#19 @janbrasna
15 years ago

  • Cc janbrasna added
  • Component changed from i18n to Permalinks
  • Keywords 2nd-opinion dev-feedback added
  • Milestone changed from 2.9 to 2.8

It used to work in non–UTF processing at some point in the past (see http://trac.wordpress.org/browser/trunk/wp-includes/formatting.php?rev=10150#L401 for the old Latin 1 transliteration code) but was apparently omitted when the UTF transliteration segment was written.

Anyway the main problem is the sanitize_title line
http://trac.wordpress.org/browser/trunk/wp-includes/query.php?rev=10150#L1671
that makes it effectively impossible to change the transliteration array at any point. I can't seem to find the point in comment:11 because the post_name field in the DB is always matched and regarding comment:5 I think the sanitize_title at that point is maybe overboard? Wouldn't just escaping it for SQL be enough? Otherwise, comment:7 sounds fine for that matter.

#20 @Denis-de-Bernardy
15 years ago

  • Resolution set to duplicate
  • Status changed from reopened to closed

Merging into #9591.

#21 @Denis-de-Bernardy
15 years ago

  • Milestone 2.8 deleted

#22 @scribu
13 years ago

  • Resolution changed from duplicate to fixed

(In [15930]) remove_accents(): Nordic characters fixes. Props einare. Fixes #4739. See #9591

@dnusim
11 years ago

Correcting transcribation of scandinavian ligatures.

#23 @dnusim
11 years ago

  • Cc wordpress-thomas@… added
  • Resolution fixed deleted
  • Status changed from closed to reopened

We are still transcribing the following ligatures wrong: ÆØÅøå. They are used in scandinavian languages, and should be transcribed as 'Ae', 'Oe', 'Aa', 'oe' and 'aa' respectively.

#24 @dnusim
11 years ago

  • Keywords has-patch added; needs-patch removed

#25 follow-up: @dnusim
11 years ago

  • Resolution set to duplicate
  • Status changed from reopened to closed

Duplicate of #9591.

Oops, wrong ticket.

#26 in reply to: ↑ 25 @SergeyBiryukov
11 years ago

Replying to dnusim:

Duplicate of #9591.

That ticket was closed on a completed milestone, please open a new one if there's still a problem.

Note: See TracTickets for help on using tickets.