Opened 4 years ago

Closed 19 months ago

#10797 closed defect (bug) (fixed)

curly quotes not stripped out slugs

Reported by: alxndr Owned by: ryan
Priority: normal Milestone: 3.3
Component: Permalinks Version: 2.9
Severity: normal Keywords: has-patch
Cc: info@…, jeremy@…

Description

Curly single- and double-quotes are percent-encoded instead of stripped out when creating slugs for post titles, categories, or tags.

To recreate, add a new post with the title: Procol Harum’s “A Whiter Shade of Pale”
expected slug: procol-harums-a-whiter-shade-of-pale
actual slug: procol-harum%e2%80%99s-%e2%80%9ca-whiter-shade-of-pale%e2%80%9d

Attachments (8)

slug-curlyquotes-dashes.patch (600 bytes) - added by alxndr 4 years ago.
patch to sanitize_title_with_dashes() in formatting.php, replaces curly single- and double- quotes and en- and em-dashes with empty string
10797.patch (1.0 KB) - added by SergeyBiryukov 22 months ago.
10797.2.patch (1019 bytes) - added by SergeyBiryukov 21 months ago.
10797.3.patch (2.4 KB) - added by SergeyBiryukov 21 months ago.
10797.4.patch (2.5 KB) - added by SergeyBiryukov 21 months ago.
10797.5.patch (2.8 KB) - added by SergeyBiryukov 21 months ago.
10797.6.patch (2.8 KB) - added by SergeyBiryukov 21 months ago.
10797.tests.patch (2.2 KB) - added by ampt 20 months ago.
unit tests

Download all attachments as: .zip

Change History (32)

alxndr4 years ago

patch to sanitize_title_with_dashes() in formatting.php, replaces curly single- and double- quotes and en- and em-dashes with empty string

  • Keywords has-patch added
  • Milestone changed from Unassigned to 2.9

I'm all for the fix, but it should be str_replace instead of preg_replace, no?

  • Component changed from General to Permalinks
  • Keywords developer-feedback, submitter-feedback added; has-patch removed
  • Owner set to ryan

No version or svn revision info included, please always set and include in description as well for bugs.
http://codex.wordpress.org/Reporting_Bugs#Reporting_a_Bug

This is not a newly introduced issue (regression), updating milestone to 3.0 as too late for existing, lower impact issues.

Please see #9591, which I believe this strongly relates to -- better to work on a more cohesive solution, then one offs.

  • Milestone changed from 2.9 to 3.0
  • Version set to 2.9
  • Keywords developer-feedback reporter-feedback added; developer-feedback, submitter-feedback removed
  • Milestone 3.0 deleted
  • Resolution set to duplicate
  • Status changed from new to closed
  • Keywords has-patch added; developer-feedback reporter-feedback removed
  • Milestone set to Awaiting Review
  • Resolution duplicate deleted
  • Status changed from closed to reopened

Closed #16036 as duplicate.

I guess sanitize_title_with_dashes() is a more appropriate function for such replacements than remove_accents(), which handles i18n-related improvements.

So I suggest to reopen this ticket.

Refreshed the patch for 3.3.

Last edited 22 months ago by SergeyBiryukov (previous) (diff)
  • Milestone changed from Awaiting Review to 3.3

This falls under "Finally fix the issues relating to special characters in permalinks using an upgrade routine" from 3.3 scope.

Not sure if an upgrade routine is necessary here. Old permalinks (with curly quotes) still work after the patch.

Moved sanitization to sanitize_title(), as per today's dev chat.

Actually, the best place would be remove_accents().

I guess remove_accents() should only handle international characters.

comment:12 follow-up: ↓ 14   scribu21 months ago

Yeah, you're right, but the rest of the special chars are handled elsewhere and hooked into 'sanitize_title'.

I'm not sure why nacin said we shouldn't rely on the filter.

Oh yeah. Because it wouldn't have the 'save' context check.

comment:14 in reply to: ↑ 12 ; follow-up: ↓ 15   nacin21 months ago

Replying to scribu:

Yeah, you're right, but the rest of the special chars are handled elsewhere and hooked into 'sanitize_title'.

I'm not sure why nacin said we shouldn't rely on the filter.

I was referring to something else.

This logic probably makes the most sense in sanitize_title_with_dashes. We can alter the add_filter() call to let the context arg be passed in.

comment:15 in reply to: ↑ 14   SergeyBiryukov21 months ago

Replying to nacin:

This logic probably makes the most sense in sanitize_title_with_dashes. We can alter the add_filter() call to let the context arg be passed in.

Done in 10797.3.patch.

  • Keywords needs-unit-tests added

Looks good. Let's get some unit tests.

Based on the tickets linked from #9591, 10797.5.patch fixes:

  • #3206: ¡ and ¿ should get stripped
  • #8765: Strip ° characters from permalink
  • ticket:9591:30: Guillemets
  • #12956: © and ™ not stripped from sanitize_title
  • #15768: Unencoded non-breaking spaces in titles don't get sanitized properly

ampt20 months ago

unit tests

Add unit tests for the latest patch

In [18705]:

Strip a number of special characters in sanitize_title_with_dashes on save. Includes quotes (curly, angle), dashes, marks, etc. props SergeyBiryukov. props ampt for the unit tests in [UT438]. see #10797.

:-) Thanks ampt! Things look great.

Before patch: Tests: 14, Assertions: 15, Failures: 6.

After patch: OK (14 tests, 22 assertions)

http://unit-tests.trac.wordpress.org/changeset/438

  • Cc info@… added
  • Cc jeremy@… added
  • Keywords needs-unit-tests removed
  • Resolution set to fixed
  • Status changed from reopened to closed

Think this one is done.

Note: See TracTickets for help on using tickets.