WordPress.org

Make WordPress Core

Opened 4 months ago

Closed 2 weeks ago

#49791 closed defect (bug) (fixed)

sanitize title should filter out the bullet • punctuation mark

Reported by: veromary Owned by: SergeyBiryukov
Milestone: 5.5 Priority: normal
Severity: normal Version: 5.4
Component: Formatting Keywords: has-patch has-unit-tests
Focuses: Cc:

Description

Some people use bullet points in their titles as decorative punctuation.

"Fancy Title • Amazing"

This bullet point is passed into the page slug which breaks some applications (like facebook links)

fancy-title-•-amazing

which would be better rendered as

fancy-title-amazing

In formatting.php this bullet character should be added to the list of special_chars to filter out.

Attachments (3)

49791.diff (433 bytes) - added by roytanck 4 months ago.
Adds the bullet, white bullet and inverse bullet characters to the filter list in sanitize_title_with_dashes().
49791-2.diff (489 bytes) - added by roytanck 4 months ago.
Removes all bullet characters that are categorized as punctuation or geometric shapes.
49791.2.diff (1.9 KB) - added by deepaklalwani 5 weeks ago.
Adds unit test cases

Download all attachments as: .zip

Change History (10)

@roytanck
4 months ago

Adds the bullet, white bullet and inverse bullet characters to the filter list in sanitize_title_with_dashes().

#1 @roytanck
4 months ago

  • Keywords has-patch added

Hello @veromary. Thank you for submitting this report. I've created a patch that adds the bullet characters to the filter list. I agree that these should not be present in permalinks.

#2 @veromary
4 months ago

Thanks @roytanck !
Hope that makes it into the next release.

#3 @SergeyBiryukov
4 months ago

  • Keywords needs-unit-tests added
  • Milestone changed from Awaiting Review to 5.5
  • Owner set to SergeyBiryukov
  • Status changed from new to reviewing

@roytanck
4 months ago

Removes all bullet characters that are categorized as punctuation or geometric shapes.

#4 follow-up: @roytanck
4 months ago

After looking into Unicode punctuation a bit more I added a second patch that removes four more characters. All seven are bullets categorized as punctuation or geometric shapes.

There are also a number of bullet characters that are mathematical operators/symbols, and a some in other categories. I did not include these, but can add them if we decide so.

In general, there are probably thousands of characters in the Unicode standard that are "not ideal" to include in slugs/URLs. Is there a standard approach to dealing with this in core?

Last edited 4 months ago by roytanck (previous) (diff)

@deepaklalwani
5 weeks ago

Adds unit test cases

#5 @deepaklalwani
5 weeks ago

  • Keywords has-unit-tests added; needs-unit-tests removed

#6 in reply to: ↑ 4 @SergeyBiryukov
2 weeks ago

Replying to roytanck:

In general, there are probably thousands of characters in the Unicode standard that are "not ideal" to include in slugs/URLs. Is there a standard approach to dealing with this in core?

The list in sanitize_title_with_dashes() includes some commonly used characters and is expanded on a case-by-case basis by specific requests. Since the ticket was initially about just one bullet character, let's go with that for now and expand further if there are any more requests.

#7 @SergeyBiryukov
2 weeks ago

  • Resolution set to fixed
  • Status changed from reviewing to closed

In 48593:

Formatting: Filter out the bullet character in sanitize_title_with_dashes().

Props roytanck, deepaklalwani, veromary.
Fixes #49791.

Note: See TracTickets for help on using tickets.