Make WordPress Core

Opened 17 years ago

Closed 11 years ago

Last modified 15 months ago

#3782 closed enhancement (fixed)

Use Digraphs for Umlauts in German Permalinks

Reported by: transalpin's profile transalpin Owned by:
Milestone: 3.6 Priority: normal
Severity: normal Version: 2.1
Component: I18N Keywords: has-patch commit has-unit-tests
Focuses: Cc:

Description

There’s this pretty useful plugin:
http://wiki.wordpress.org/GermanPermalinks

To quote from the author’s site:

In German, the convention is to transliterate umlauts into two letters (ä=ae, ö=oe, ü=ue). The CVS code has some two-character conversions, so I added the umlauts to that list in my plugin, and the result was schroeder-sucks, which is just what I wanted.

I wouldn’t want Schröder to suck, but I think that it shouldn’t take a plugin to fix this bug. There are more characters that are wrongly sanitized (e.g., å = aa, æ = ae, …)

(Note that in Spanish and French, the ü is not transliterated in this matter. Not sure about Turkish. Maybe WordPress should check the locale before creating the post slug)

Attachments (6)

3782.patch (672 bytes) - added by SergeyBiryukov 11 years ago.
3782.2.patch (906 bytes) - added by ocean90 11 years ago.
3782.2-unit-test.patch (649 bytes) - added by ocean90 11 years ago.
3782.3.patch (733 bytes) - added by SergeyBiryukov 11 years ago.
3782.4.patch (719 bytes) - added by SergeyBiryukov 11 years ago.
3782.5.patch (742 bytes) - added by SergeyBiryukov 11 years ago.

Download all attachments as: .zip

Change History (29)

#1 @link92
17 years ago

Another option would be to use IRIs, and then normalise them to URIs for compatibility with UAs that don't support IRIs.

#2 @Nazgul
17 years ago

  • Type changed from defect to enhancement

I think this should stay a plugin, because you stated yourself that there's differences between languages/regions on how to interpret/translate these characters.

We can't assume everybody wants to use the German convention.

#3 @rob1n
17 years ago

  • Resolution set to wontfix
  • Status changed from new to closed

Yeah, this is definitely plugin material. Adding locale-specific checks (and maintaining them !!) would be waaay too much to put into the core.

#4 @foolswisdom
17 years ago

  • Milestone 2.2 deleted

#5 @Bernhard Reiter
11 years ago

  • Cc Bernhard Reiter added
  • Resolution wontfix deleted
  • Status changed from closed to reopened

Now that Wordpress seems to become more and more i18n/l10n aware, isn't it time to reconsider this? Having an umlaut normalise to only one letter seriously pisses of most German wordpress users I know of (including myself) that care about their permalinks, which is maybe something that latin-1-charset-language developers don't realise.

#6 @nacin
11 years ago

  • Milestone set to 3.6

I agree, this is doable by changing the remove_accents() mapping based on the locale.

We need a full list of all affected characters, and what their sanitized characters should be.

#7 @ocean90
11 years ago

  • Cc ocean90 added

#8 @toscho
11 years ago

  • Cc info@… added

#9 @SergeyBiryukov
11 years ago

  • Keywords has-patch added

#10 @SergeyBiryukov
11 years ago

3782.patch handles the characters from WP Permalauts plugin.

#11 @SergeyBiryukov
11 years ago

In 1178/tests:

remove_accents() test for German umlauts. see #3782.

#12 @SergeyBiryukov
11 years ago

In 1179/tests:

Add capital characters to remove_accents() test for German umlauts. see #3782.

#13 @nacin
11 years ago

Let's assign get_locale() to a variable so if/when we add more locale-specific rules, we don't call it additional times, the diff is a bit cleaner, and so the commit here looks like it is written to be extended.

@ocean90
11 years ago

#14 @ocean90
11 years ago

Ä in Äpfel should be Ae. Of course, when all characters are uppercased AE seems fine. But that's usually not the case.

#15 @SergeyBiryukov
11 years ago

In 1180/tests:

Correct the expected results in remove_accents() test for German umlauts. props ocean90. see #3782.

#16 @SergeyBiryukov
11 years ago

  • Keywords commit added

#17 @nacin
11 years ago

Looks quite solid. Three things:

Probably no need to ever call get_locale() if ! seems_utf8().

I wonder if array_merge() is particularly slow or inefficient here, as this isn't a small array. The union operator (+) has a bit less overhead, as might just adding on the individual key/value pairs.

I would tend to agree that A should become Ae, not AE. remove_accents() does follow the same uppercase pattern elsewhere in the file. Since this is a language-specific rule, I am not too worried about the inconsistency. Note the only other usage of remove_accents() beyond permalinks, is sanitize_user(), which does not enforce strtolower (it used to in multisite).

#18 @SergeyBiryukov
11 years ago

Performed a few tests with 20,000 iterations of remove_accents().

Without the patch10.290s
array_merge() (3782.3.patch)14.624s
Union operator (+) (3782.4.patch)11.342s
Direct assignment (3782.5.patch)10.619s

So, 3782.5.patch shows the best performance.

#19 @nacin
11 years ago

Good to go.

#20 @SergeyBiryukov
11 years ago

  • Resolution set to fixed
  • Status changed from reopened to closed

In 23361:

Use digraphs for German umlauts in remove_accents(). props SergeyBiryukov, ocean90. fixes #3782.

This ticket was mentioned in PR #3783 on WordPress/wordpress-develop by @costdev.


15 months ago
#21

  • Keywords has-unit-tests added

Now that the minimum PHP version has been raised, this test and its data provider can be removed.

The datasets from the data provider have been relocated to other data providers in the test class.

N.B. This PR has expected test failures in PHP 5.6 tests until #3782 is merged.

This ticket was mentioned in PR #3786 on WordPress/wordpress-develop by @costdev.


15 months ago
#22

The IMAGETYPE_WEBP constant is defined in PHP 7.1 and later.

Now that the minimum PHP version has been raised, the IMAGETYPE_WEBP constant definition can be removed.

Ref:
PHP manual - Migrating from PHP 7.0.x to PHP 7.1.x
https://www.php.net/manual/en/migration71.constants.php

N.B. This PR has expected test failures in PHP 5.6 and 7.0 tests until #3782 is merged.

This ticket was mentioned in PR #3787 on WordPress/wordpress-develop by @costdev.


15 months ago
#23

The random_compat library polyfills functions from the Random PHP extension in PHP 7.x to PHP 5.x. Random is a core PHP extension, and will be included in all PHP binaries.

Now that the minimum PHP version has been raised, the random_compat library can be removed.

This change also removes the inclusion of random_int() in wp-includes/compat.php.

Ref:

  • PHP manual - Membership - Core Extensions

https://www.php.net/manual/en/extensions.membership.php

  • Paragonie/random_compat GitHub repository

https://github.com/paragonie/random_compat

N.B. This PR has expected test failures for PHP 5.6 tests until #3782 is merged.

Note: See TracTickets for help on using tickets.