Make WordPress Core

Opened 3 years ago

Closed 3 years ago

#37086 closed defect (bug) (fixed)

Remove Middle Dot (U+00B7) from URL (for Catalan only?)

Reported by: xavivars Owned by: ocean90
Milestone: 4.6 Priority: normal
Severity: normal Version:
Component: Formatting Keywords: has-patch has-unit-tests commit
Focuses: Cc:
PR Number:

Description (last modified by ocean90)

Currently, remove_accents() converts all characters to an ASCII equivalent so it looks "nice" as a URLs without the need of escaping characters (and, thus, showing % as part of the links).

However, the middle dot (U+00B7) is not removed. Middle dot is used in Catalan between two L (like this l·l).

Quoting from wikipedia:

The flown dot (Catalan: punt volat) is used in Catalan between two Ls in cases where each belongs to a separate syllable, for example cel·la, "cell". This distinguishes such "geminate Ls" (ela geminada), which are pronounced [ɫː], from "double L" (doble ela), which are written without the flown dot and are pronounced [ʎ].

On top of non being consistent (all other Catalan diacritics are removed), not removing this character has some side-effects, because there are some URL libraries that don't take it into account (like the one Twitter uses: see https://twitter.com/VilaWeb/status/738348674137399296).

My proposal is to remove that char when it appears between two l.

Attachments (5)

formatting.php.ca-only.patch (475 bytes) - added by xavivars 3 years ago.
Removes middle dot when Catalan is set as a language
formatting.php.all-lang.patch (440 bytes) - added by xavivars 3 years ago.
Removes middle dot for all languages
formatting.php.patch (487 bytes) - added by xavivars 3 years ago.
Formatting.php patch
RemoveAccents.php.patch (815 bytes) - added by xavivars 3 years ago.
RemoveAccents.php test patch
37086.patch (2.0 KB) - added by SergeyBiryukov 3 years ago.

Download all attachments as: .zip

Change History (15)

3 years ago

Removes middle dot when Catalan is set as a language

3 years ago

Removes middle dot for all languages

#1 @swissspidy
3 years ago

  • Keywords has-patch added

#2 @ocean90
3 years ago

  • Description modified (diff)
  • Keywords needs-refresh needs-unit-tests added
  • Milestone changed from Awaiting Review to Future Release

@xavivars Thanks for your patches. The replacement should only be done for Catalan. Removing the dots can maybe handled by sanitize_title_with_dashes().

Can you make sure that the patches are relative to the root directory? And there should be a unit test for this change in /tests/phpunit/tests/formatting/RemoveAccents.php.

#3 @xavivars
3 years ago

@ocean90: should the patches be relative to the root directory of which repo? I've found contradictory information (sometimes pointing to develop.svn.wordpress.org and some other times pointing to core.svn).

I'll also add unit tests for that.

#4 @swissspidy
3 years ago

develop.svn.wordpress.org (or develop.git.wordpress.org) would be the correct repository for patches.

3 years ago

Formatting.php patch

3 years ago

RemoveAccents.php test patch

#5 @xavivars
3 years ago

@ocean90: I don't think I agree the removal of those dots should be done at sanitize_title_with_dashes. The middot it affects how the L are pronounced, and in fact, the first case was already covered in the same remove_accents method (I've removed it from the new formating.php.patch). However, if you think those changes belong better to sanitize_title_with_dashes, I'm open to discuss about that.

#6 @xavivars
3 years ago

  • Keywords has-unit-tests dev-feedback added; needs-refresh needs-unit-tests removed

#7 @SergeyBiryukov
3 years ago

  • Milestone changed from Future Release to 4.6

#8 @SergeyBiryukov
3 years ago

  • Keywords commit added; dev-feedback removed

37086.patch combines the patch and the test and also updates the docs.

I think it's fine to handle this in remove_accents().

#9 @xavivars
3 years ago

Is there anything pending for this ticket to be closed that I can help with?

#10 @ocean90
3 years ago

  • Owner set to ocean90
  • Resolution set to fixed
  • Status changed from new to closed

In 37853:

I18N: Add support for the Catalan flown dot in remove_accents().

Props xavivars, SergeyBiryukov.
Fixes #37086.

Note: See TracTickets for help on using tickets.