WordPress.org

Make WordPress Core

Opened 6 years ago

Closed 4 years ago

#10797 closed defect (bug) (fixed)

curly quotes not stripped out slugs

Reported by: alxndr Owned by: ryan
Milestone: 3.3 Priority: normal
Severity: normal Version: 2.9
Component: Permalinks Keywords: has-patch
Focuses: Cc:

Description

Curly single- and double-quotes are percent-encoded instead of stripped out when creating slugs for post titles, categories, or tags.

To recreate, add a new post with the title: Procol Harum’s “A Whiter Shade of Pale”
expected slug: procol-harums-a-whiter-shade-of-pale
actual slug: procol-harum%e2%80%99s-%e2%80%9ca-whiter-shade-of-pale%e2%80%9d

Attachments (8)

slug-curlyquotes-dashes.patch (600 bytes) - added by alxndr 6 years ago.
patch to sanitize_title_with_dashes() in formatting.php, replaces curly single- and double- quotes and en- and em-dashes with empty string
10797.patch (1.0 KB) - added by SergeyBiryukov 4 years ago.
10797.2.patch (1019 bytes) - added by SergeyBiryukov 4 years ago.
10797.3.patch (2.4 KB) - added by SergeyBiryukov 4 years ago.
10797.4.patch (2.5 KB) - added by SergeyBiryukov 4 years ago.
10797.5.patch (2.8 KB) - added by SergeyBiryukov 4 years ago.
10797.6.patch (2.8 KB) - added by SergeyBiryukov 4 years ago.
10797.tests.patch (2.2 KB) - added by ampt 4 years ago.
unit tests

Download all attachments as: .zip

Change History (32)

@alxndr6 years ago

patch to sanitize_title_with_dashes() in formatting.php, replaces curly single- and double- quotes and en- and em-dashes with empty string

comment:1 @scribu6 years ago

  • Keywords has-patch added
  • Milestone changed from Unassigned to 2.9

comment:2 @nacin6 years ago

I'm all for the fix, but it should be str_replace instead of preg_replace, no?

comment:3 @lloydbudd6 years ago

  • Component changed from General to Permalinks
  • Keywords developer-feedback submitter-feedback added; has-patch removed
  • Owner set to ryan

No version or svn revision info included, please always set and include in description as well for bugs.
http://codex.wordpress.org/Reporting_Bugs#Reporting_a_Bug

This is not a newly introduced issue (regression), updating milestone to 3.0 as too late for existing, lower impact issues.

Please see #9591, which I believe this strongly relates to -- better to work on a more cohesive solution, then one offs.

comment:4 @lloydbudd6 years ago

  • Milestone changed from 2.9 to 3.0
  • Version set to 2.9

comment:5 @lloydbudd6 years ago

  • Keywords reporter-feedback added; submitter-feedback removed

comment:6 @Denis-de-Bernardy6 years ago

  • Milestone 3.0 deleted
  • Resolution set to duplicate
  • Status changed from new to closed

@SergeyBiryukov4 years ago

comment:7 @SergeyBiryukov4 years ago

  • Keywords has-patch added; developer-feedback reporter-feedback removed
  • Milestone set to Awaiting Review
  • Resolution duplicate deleted
  • Status changed from closed to reopened

Closed #16036 as duplicate.

I guess sanitize_title_with_dashes() is a more appropriate function for such replacements than remove_accents(), which handles i18n-related improvements.

So I suggest to reopen this ticket.

Refreshed the patch for 3.3.

Last edited 4 years ago by SergeyBiryukov (previous) (diff)

comment:8 @SergeyBiryukov4 years ago

  • Milestone changed from Awaiting Review to 3.3

This falls under "Finally fix the issues relating to special characters in permalinks using an upgrade routine" from 3.3 scope.

Not sure if an upgrade routine is necessary here. Old permalinks (with curly quotes) still work after the patch.

@SergeyBiryukov4 years ago

comment:9 @SergeyBiryukov4 years ago

Moved sanitization to sanitize_title(), as per today's dev chat.

comment:10 @scribu4 years ago

Actually, the best place would be remove_accents().

comment:11 @SergeyBiryukov4 years ago

I guess remove_accents() should only handle international characters.

comment:12 follow-up: @scribu4 years ago

Yeah, you're right, but the rest of the special chars are handled elsewhere and hooked into 'sanitize_title'.

I'm not sure why nacin said we shouldn't rely on the filter.

comment:13 @scribu4 years ago

Oh yeah. Because it wouldn't have the 'save' context check.

comment:14 in reply to: ↑ 12 ; follow-up: @nacin4 years ago

Replying to scribu:

Yeah, you're right, but the rest of the special chars are handled elsewhere and hooked into 'sanitize_title'.

I'm not sure why nacin said we shouldn't rely on the filter.

I was referring to something else.

This logic probably makes the most sense in sanitize_title_with_dashes. We can alter the add_filter() call to let the context arg be passed in.

@SergeyBiryukov4 years ago

comment:15 in reply to: ↑ 14 @SergeyBiryukov4 years ago

Replying to nacin:

This logic probably makes the most sense in sanitize_title_with_dashes. We can alter the add_filter() call to let the context arg be passed in.

Done in 10797.3.patch.

comment:16 @nacin4 years ago

  • Keywords needs-unit-tests added

Looks good. Let's get some unit tests.

@SergeyBiryukov4 years ago

@SergeyBiryukov4 years ago

comment:17 @SergeyBiryukov4 years ago

Based on the tickets linked from #9591, 10797.5.patch fixes:

  • #3206: ¡ and ¿ should get stripped
  • #8765: Strip ° characters from permalink
  • ticket:9591:30: Guillemets
  • #12956: © and ™ not stripped from sanitize_title

@SergeyBiryukov4 years ago

comment:18 @SergeyBiryukov4 years ago

  • #15768: Unencoded non-breaking spaces in titles don't get sanitized properly

@ampt4 years ago

unit tests

comment:19 @ampt4 years ago

Add unit tests for the latest patch

comment:20 @nacin4 years ago

In [18705]:

Strip a number of special characters in sanitize_title_with_dashes on save. Includes quotes (curly, angle), dashes, marks, etc. props SergeyBiryukov. props ampt for the unit tests in [UT438]. see #10797.

comment:21 @nacin4 years ago

:-) Thanks ampt! Things look great.

Before patch: Tests: 14, Assertions: 15, Failures: 6.

After patch: OK (14 tests, 22 assertions)

http://unit-tests.trac.wordpress.org/changeset/438

comment:22 @toscho4 years ago

  • Cc info@… added

comment:23 @hd-J4 years ago

  • Cc jeremy@… added

comment:24 @nacin4 years ago

  • Keywords needs-unit-tests removed
  • Resolution set to fixed
  • Status changed from reopened to closed

Think this one is done.

Note: See TracTickets for help on using tickets.