WordPress.org

Make WordPress Core

Opened 7 years ago

Closed 13 months ago

Last modified 10 months ago

#4739 closed defect (bug) (duplicate)

Some icelandic/Norwegian/Danish letters do not work in page slugs

Reported by: einare Owned by: westi
Milestone: Priority: high
Severity: major Version: 2.2.1
Component: Permalinks Keywords: early 2nd-opinion dev-feedback has-patch
Focuses: Cc:

Description (last modified by westi)

When the page slug is generated from the post title, three icelandic letters are not converted correctly. These three letters are Ð ð, Þ þ and Æ æ. They should be converted to D d, TH th and AE ae but are not.

For instance, when I made a post with the title ‘Þátturinn’ the post-slug would become ‘þatturinn’ and when I tried to enter that address in my address bar it changed to ‘%c3%beatturinn’ and I got a ‘page not found’ error from Wordpress.

This can be fixed by adding the following six lines to formatting.txt, in the function remove_accents, inside the if (seems_utf8($string)) { condition.

chr(195).chr(144) => 'D', 
chr(195).chr(176) => 'd',
chr(195).chr(158) => 'TH',
chr(195).chr(190) => 'th',
chr(195).chr(134) => 'AE',
chr(195).chr(166) => 'ae',

Also (from #5952)
When the post slug is generated from the post title, the letter 'Å' 'å' converts to 'a', should convert to 'aa' which is the general practice in countries using this character (Confer Wikipedia).

Furthermore, the Norwegian/Danish characters 'Æ' 'æ' and 'Ø' 'ø' should be converted to respectively 'ae' and 'oe'. As of now, these convert to '%c3%a6' and '%c3%b8'.

Attachments (2)

4739.patch (2.8 KB) - added by einare 7 years ago.
Fix for the ticket
4739.ligatures.patch (2.0 KB) - added by dnusim 13 months ago.
Correcting transcribation of scandinavian ligatures.

Download all attachments as: .zip

Change History (29)

einare7 years ago

Fix for the ticket

comment:1 Nazgul7 years ago

  • Keywords has-patch added
  • Milestone changed from 2.2.3 to 2.3 (trunk)

comment:2 westi7 years ago

  • Keywords dev-reviewed added
  • Owner changed from anonymous to westi
  • Status changed from new to assigned

+1

comment:3 westi7 years ago

  • Resolution set to fixed
  • Status changed from assigned to closed

(In [5969]) Add utf8->ascii mappings for icelandic letters. Fixes #4739 props einare

comment:4 nbachiyski7 years ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

This commit breaks permalinks of posts, containing these characters and posted using the old version of this function.

We should either revert it or pass all permalinks, which aren't manually edited, through the new sanitize title. IN order to achieve this we have to compare the output of the old and the new remove_accents functions.

comment:5 nbachiyski7 years ago

Or maybe we should change the query post name matching, so that it uses the raw post name from the url, not the decoded one. If we don't do this we should be very careful in modifying sanitize_title's behaviour.

comment:6 Nazgul7 years ago

  • Keywords developer-feedback added; has-patch dev-reviewed removed
  • Priority changed from normal to high
  • Severity changed from minor to major

comment:7 ryan7 years ago

Affected posts can be fixed by resaving them. The old slug redirector will handle redirecting the old URL. But, that's not very friendly. For 2.3 we should probably revert the change.

comment:8 ryan7 years ago

(In [6150]) Revert [5969]. It can break permalinks. see #4739

comment:9 ryan7 years ago

  • Milestone changed from 2.3 to 2.4

Reverted for 2.3. We'll try to fix it properly for 2.4.

comment:10 westi7 years ago

  • Keywords needs=patch early added; developer-feedback removed

I guess we need to make sure that any changes we make to the slug generation code they don't affect old posts in the way it currently does.

We should always be checking against the string we use to generate the permalink not a re-santized one.

comment:11 nbachiyski7 years ago

westi, we aren't always generating the permalink based on information we have in the database. Usually the title is used, but users are allowed to enter their own slugs and we don't keep the original slug -- only the sanitized one.

comment:12 mdawaffe7 years ago

  • Keywords needs-patch added; needs=patch removed

comment:13 westi6 years ago

  • Description modified (diff)
  • Milestone changed from 2.5 to 2.6
  • Summary changed from Some icelandic letters do not work in page slugs to Some icelandic/Norwegian/Danish letters do not work in page slugs

Closed #5952 as a dupe of this and updated bug with more characters to fix.

Moving to 2.6 as this needs fixing early and lots of testing so we can be sure we don't break things.

comment:14 dnusim6 years ago

Is there any way I can help without actually coding?

comment:15 follow-up: snakefoot5 years ago

Duplicate #4273 ?

comment:16 in reply to: ↑ 15 westi5 years ago

Replying to snakefoot:

Duplicate #4273 ?

I think that is a similar issue not sure if it's a dupe though

comment:17 shogunn5 years ago

This problem seems to be something quite easy to fix. If I understand correctly you only have to add a few lines to formatting.txt.

Why has this then not already been fixed?

comment:18 svendk5 years ago

I had similar problems (a Page titled "Bøger" (books) screwed up the permalink (slug?), and after reading this, I created (wow) a few more lines fixing it for Å/å and Ø/ø.
With the fix in the top, and these lines, my problems with æ/ø/å is done.

chr(195).chr(133) => 'Aa',
chr(195).chr(165) => 'aa',
chr(195).chr(152) => 'Oe',
chr(195).chr(184) => 'oe',

Please incorporate it in the official version :)

/svendk

comment:19 janbrasna5 years ago

  • Cc janbrasna added
  • Component changed from i18n to Permalinks
  • Keywords 2nd-opinion dev-feedback added
  • Milestone changed from 2.9 to 2.8

It used to work in non–UTF processing at some point in the past (see http://trac.wordpress.org/browser/trunk/wp-includes/formatting.php?rev=10150#L401 for the old Latin 1 transliteration code) but was apparently omitted when the UTF transliteration segment was written.

Anyway the main problem is the sanitize_title line
http://trac.wordpress.org/browser/trunk/wp-includes/query.php?rev=10150#L1671
that makes it effectively impossible to change the transliteration array at any point. I can't seem to find the point in comment:11 because the post_name field in the DB is always matched and regarding comment:5 I think the sanitize_title at that point is maybe overboard? Wouldn't just escaping it for SQL be enough? Otherwise, comment:7 sounds fine for that matter.

comment:20 Denis-de-Bernardy5 years ago

  • Resolution set to duplicate
  • Status changed from reopened to closed

Merging into #9591.

comment:21 Denis-de-Bernardy5 years ago

  • Milestone 2.8 deleted

comment:22 scribu3 years ago

  • Resolution changed from duplicate to fixed

(In [15930]) remove_accents(): Nordic characters fixes. Props einare. Fixes #4739. See #9591

dnusim13 months ago

Correcting transcribation of scandinavian ligatures.

comment:23 dnusim13 months ago

  • Cc wordpress-thomas@… added
  • Resolution fixed deleted
  • Status changed from closed to reopened

We are still transcribing the following ligatures wrong: ÆØÅøå. They are used in scandinavian languages, and should be transcribed as 'Ae', 'Oe', 'Aa', 'oe' and 'aa' respectively.

comment:24 dnusim13 months ago

  • Keywords has-patch added; needs-patch removed

comment:25 follow-up: dnusim13 months ago

  • Resolution set to duplicate
  • Status changed from reopened to closed

Duplicate of #9591.

Oops, wrong ticket.

comment:26 in reply to: ↑ 25 SergeyBiryukov13 months ago

Replying to dnusim:

Duplicate of #9591.

That ticket was closed on a completed milestone, please open a new one if there's still a problem.

Note: See TracTickets for help on using tickets.