Ticket #10797 (closed defect (bug): fixed)

Opened 3 years ago

Last modified 7 months ago

curly quotes not stripped out slugs

Reported by: alxndr Owned by: ryan
Priority: normal Milestone: 3.3
Component: Permalinks Version: 2.9
Severity: normal Keywords: has-patch
Cc: info@…, jeremy@…

Description

Curly single- and double-quotes are percent-encoded instead of stripped out when creating slugs for post titles, categories, or tags.

To recreate, add a new post with the title: Procol Harum’s “A Whiter Shade of Pale” expected slug: procol-harums-a-whiter-shade-of-pale actual slug: procol-harum%e2%80%99s-%e2%80%9ca-whiter-shade-of-pale%e2%80%9d

Attachments

slug-curlyquotes-dashes.patch Download (600 bytes) - added by alxndr 3 years ago.
patch to sanitize_title_with_dashes() in formatting.php, replaces curly single- and double- quotes and en- and em-dashes with empty string
10797.patch Download (1.0 KB) - added by SergeyBiryukov 10 months ago.
10797.2.patch Download (1019 bytes) - added by SergeyBiryukov 9 months ago.
10797.3.patch Download (2.4 KB) - added by SergeyBiryukov 9 months ago.
10797.4.patch Download (2.5 KB) - added by SergeyBiryukov 9 months ago.
10797.5.patch Download (2.8 KB) - added by SergeyBiryukov 9 months ago.
10797.6.patch Download (2.8 KB) - added by SergeyBiryukov 9 months ago.
10797.tests.patch Download (2.2 KB) - added by ampt 8 months ago.
unit tests

Change History

alxndr3 years ago

patch to sanitize_title_with_dashes() in formatting.php, replaces curly single- and double- quotes and en- and em-dashes with empty string

  • Keywords has-patch added
  • Milestone changed from Unassigned to 2.9

I'm all for the fix, but it should be str_replace instead of preg_replace, no?

  • Keywords developer-feedback, submitter-feedback added; has-patch removed
  • Owner set to ryan
  • Component changed from General to Permalinks

No version or svn revision info included, please always set and include in description as well for bugs.
 http://codex.wordpress.org/Reporting_Bugs#Reporting_a_Bug

This is not a newly introduced issue (regression), updating milestone to 3.0 as too late for existing, lower impact issues.

Please see #9591, which I believe this strongly relates to -- better to work on a more cohesive solution, then one offs.

  • Version set to 2.9
  • Milestone changed from 2.9 to 3.0
  • Keywords developer-feedback reporter-feedback added; developer-feedback, submitter-feedback removed
  • Status changed from new to closed
  • Resolution set to duplicate
  • Milestone 3.0 deleted
  • Keywords has-patch added; developer-feedback reporter-feedback removed
  • Status changed from closed to reopened
  • Resolution duplicate deleted
  • Milestone set to Awaiting Review

Closed #16036 as duplicate.

I guess sanitize_title_with_dashes() is a more appropriate function for such replacements than remove_accents(), which handles i18n-related improvements.

So I suggest to reopen this ticket.

Refreshed the patch for 3.3.

Last edited 10 months ago by SergeyBiryukov (previous) (diff)
  • Milestone changed from Awaiting Review to 3.3

This falls under "Finally fix the issues relating to special characters in permalinks using an upgrade routine" from 3.3 scope.

Not sure if an upgrade routine is necessary here. Old permalinks (with curly quotes) still work after the patch.

Moved sanitization to sanitize_title(), as per  today's dev chat.

Actually, the best place would be remove_accents().

I guess remove_accents() should only handle international characters.

comment:12 follow-up: ↓ 14   scribu9 months ago

Yeah, you're right, but the rest of the special chars are handled elsewhere and hooked into 'sanitize_title'.

I'm not sure why nacin said we shouldn't rely on the filter.

Oh yeah. Because it wouldn't have the 'save' context check.

comment:14 in reply to: ↑ 12 ; follow-up: ↓ 15   nacin9 months ago

Replying to scribu:

Yeah, you're right, but the rest of the special chars are handled elsewhere and hooked into 'sanitize_title'.

I'm not sure why nacin said we shouldn't rely on the filter.

I was referring to something else.

This logic probably makes the most sense in sanitize_title_with_dashes. We can alter the add_filter() call to let the context arg be passed in.

comment:15 in reply to: ↑ 14   SergeyBiryukov9 months ago

Replying to nacin:

This logic probably makes the most sense in sanitize_title_with_dashes. We can alter the add_filter() call to let the context arg be passed in.

Done in 10797.3.patch Download.

  • Keywords needs-unit-tests added

Looks good. Let's get some unit tests.

Based on the tickets linked from #9591, 10797.5.patch Download fixes:

  • #3206: ¡ and ¿ should get stripped
  • #8765: Strip ° characters from permalink
  • ticket:9591:30: Guillemets
  • #12956: © and ™ not stripped from sanitize_title
  • #15768: Unencoded non-breaking spaces in titles don't get sanitized properly

ampt8 months ago

unit tests

Add unit tests for the latest patch

In [18705]:

Strip a number of special characters in sanitize_title_with_dashes on save. Includes quotes (curly, angle), dashes, marks, etc. props SergeyBiryukov. props ampt for the unit tests in  [UT438]. see #10797.

:-) Thanks ampt! Things look great.

Before patch: Tests: 14, Assertions: 15, Failures: 6.

After patch: OK (14 tests, 22 assertions)

 http://unit-tests.trac.wordpress.org/changeset/438

  • Cc info@… added
  • Cc jeremy@… added
  • Keywords needs-unit-tests removed
  • Status changed from reopened to closed
  • Resolution set to fixed

Think this one is done.

Note: See TracTickets for help on using tickets.