WordPress.org

Make WordPress Core

Opened 7 weeks ago

Last modified 7 weeks ago

#49129 new enhancement

Incorrect German Umlaut substitutions

Reported by: bmuessig Owned by:
Milestone: Awaiting Review Priority: normal
Severity: minor Version: trunk
Component: Formatting Keywords: 2nd-opinion
Focuses: Cc:

Description

Hello,

as a native speaker, I find the German Umlaut substitutions quite strange.
Correctly, ü is turned into ue, but Ü is turned into Ue.
Since the second character should be considered as part of the former character, the former capitalization should be respected.

This is especially strange in uppercase text:
FRÖHLICH -> FROeHLICH
KÖNNEN -> KOeNNEN

If it was changed to be all uppercase, it would work much better:
FRÖHLICH -> FROEHLICH
KÖNNEN -> KOENNEN

When used at the start of a word, it would also work fine, if capitalized:
ÖFFENTLICH -> OEffentlich
ÜBERGANG -> UEbergang

Therefore, I would propose changing the table located in wp-includes/formatting.php:1941 (https://github.com/WordPress/WordPress/blob/master/wp-includes/formatting.php#L1941) to the following:

		if ( 'de_DE' == $locale || 'de_DE_formal' == $locale || 'de_CH' == $locale || 'de_CH_informal' == $locale ) {
			$chars['Ä'] = 'AE';
			$chars['ä'] = 'ae';
			$chars['Ö'] = 'OE';
			$chars['ö'] = 'oe';
			$chars['Ü'] = 'UE';
			$chars['ü'] = 'ue';
			$chars['ß'] = 'ss';

Though, to be entirely correct, the surrounding characters would have to be checked, which would be difficult, given the current architecture.
There even is a capital ß now, which would be substituted with SS.

I am happy to hear any second opinions on this.

Best regards,
Benedikt

Change History (3)

#1 @SergeyBiryukov
7 weeks ago

Hi there, welcome to WordPress Trac! Thanks for the ticket.

Just noting this was originally introduced in [23361] / #3782, and extended to other locales in [33027] and [37698].

When used at the start of a word, it would also work fine, if capitalized:
ÖFFENTLICH -> OEffentlich
ÜBERGANG -> UEbergang

There was an argument against that in comment:14:ticket:3782, hence the current list.

I guess we'll have to find a way to check the surrounding characters to match the case correctly.

#2 follow-up: @tobifjellner
7 weeks ago

Perhaps de_AT should also be included? @pputzer ?

#3 in reply to: ↑ 2 @pputzer
7 weeks ago

Replying to tobifjellner:

Perhaps de_AT should also be included? @pputzer ?

Yes. de_AT needs to be included as well.

Regarding the topic of the ticket, is this a problem in practice? Where are those transformation rules used that don't also convert to lower casw (i,e, for slugs)?

Note: See TracTickets for help on using tickets.