Make WordPress Core

Opened 15 years ago

Closed 13 years ago

#10797 closed defect (bug) (fixed)

curly quotes not stripped out slugs

Reported by: alxndr's profile alxndr Owned by: ryan's profile ryan
Milestone: 3.3 Priority: normal
Severity: normal Version: 2.9
Component: Permalinks Keywords: has-patch
Focuses: Cc:

Description

Curly single- and double-quotes are percent-encoded instead of stripped out when creating slugs for post titles, categories, or tags.

To recreate, add a new post with the title: Procol Harum’s “A Whiter Shade of Pale”
expected slug: procol-harums-a-whiter-shade-of-pale
actual slug: procol-harum%e2%80%99s-%e2%80%9ca-whiter-shade-of-pale%e2%80%9d

Attachments (8)

slug-curlyquotes-dashes.patch (600 bytes) - added by alxndr 15 years ago.
patch to sanitize_title_with_dashes() in formatting.php, replaces curly single- and double- quotes and en- and em-dashes with empty string
10797.patch (1.0 KB) - added by SergeyBiryukov 13 years ago.
10797.2.patch (1019 bytes) - added by SergeyBiryukov 13 years ago.
10797.3.patch (2.4 KB) - added by SergeyBiryukov 13 years ago.
10797.4.patch (2.5 KB) - added by SergeyBiryukov 13 years ago.
10797.5.patch (2.8 KB) - added by SergeyBiryukov 13 years ago.
10797.6.patch (2.8 KB) - added by SergeyBiryukov 13 years ago.
10797.tests.patch (2.2 KB) - added by ampt 13 years ago.
unit tests

Download all attachments as: .zip

Change History (32)

@alxndr
15 years ago

patch to sanitize_title_with_dashes() in formatting.php, replaces curly single- and double- quotes and en- and em-dashes with empty string

#1 @scribu
15 years ago

  • Keywords has-patch added
  • Milestone changed from Unassigned to 2.9

#2 @nacin
15 years ago

I'm all for the fix, but it should be str_replace instead of preg_replace, no?

#3 @lloydbudd
15 years ago

  • Component changed from General to Permalinks
  • Keywords developer-feedback submitter-feedback added; has-patch removed
  • Owner set to ryan

No version or svn revision info included, please always set and include in description as well for bugs.
http://codex.wordpress.org/Reporting_Bugs#Reporting_a_Bug

This is not a newly introduced issue (regression), updating milestone to 3.0 as too late for existing, lower impact issues.

Please see #9591, which I believe this strongly relates to -- better to work on a more cohesive solution, then one offs.

#4 @lloydbudd
15 years ago

  • Milestone changed from 2.9 to 3.0
  • Version set to 2.9

#5 @lloydbudd
15 years ago

  • Keywords reporter-feedback added; submitter-feedback removed

#6 @Denis-de-Bernardy
15 years ago

  • Milestone 3.0 deleted
  • Resolution set to duplicate
  • Status changed from new to closed

#7 @SergeyBiryukov
13 years ago

  • Keywords has-patch added; developer-feedback reporter-feedback removed
  • Milestone set to Awaiting Review
  • Resolution duplicate deleted
  • Status changed from closed to reopened

Closed #16036 as duplicate.

I guess sanitize_title_with_dashes() is a more appropriate function for such replacements than remove_accents(), which handles i18n-related improvements.

So I suggest to reopen this ticket.

Refreshed the patch for 3.3.

Last edited 13 years ago by SergeyBiryukov (previous) (diff)

#8 @SergeyBiryukov
13 years ago

  • Milestone changed from Awaiting Review to 3.3

This falls under "Finally fix the issues relating to special characters in permalinks using an upgrade routine" from 3.3 scope.

Not sure if an upgrade routine is necessary here. Old permalinks (with curly quotes) still work after the patch.

#9 @SergeyBiryukov
13 years ago

Moved sanitization to sanitize_title(), as per today's dev chat.

#10 @scribu
13 years ago

Actually, the best place would be remove_accents().

#11 @SergeyBiryukov
13 years ago

I guess remove_accents() should only handle international characters.

#12 follow-up: @scribu
13 years ago

Yeah, you're right, but the rest of the special chars are handled elsewhere and hooked into 'sanitize_title'.

I'm not sure why nacin said we shouldn't rely on the filter.

#13 @scribu
13 years ago

Oh yeah. Because it wouldn't have the 'save' context check.

#14 in reply to: ↑ 12 ; follow-up: @nacin
13 years ago

Replying to scribu:

Yeah, you're right, but the rest of the special chars are handled elsewhere and hooked into 'sanitize_title'.

I'm not sure why nacin said we shouldn't rely on the filter.

I was referring to something else.

This logic probably makes the most sense in sanitize_title_with_dashes. We can alter the add_filter() call to let the context arg be passed in.

#15 in reply to: ↑ 14 @SergeyBiryukov
13 years ago

Replying to nacin:

This logic probably makes the most sense in sanitize_title_with_dashes. We can alter the add_filter() call to let the context arg be passed in.

Done in 10797.3.patch.

#16 @nacin
13 years ago

  • Keywords needs-unit-tests added

Looks good. Let's get some unit tests.

#17 @SergeyBiryukov
13 years ago

Based on the tickets linked from #9591, 10797.5.patch fixes:

  • #3206: ¡ and ¿ should get stripped
  • #8765: Strip ° characters from permalink
  • ticket:9591:30: Guillemets
  • #12956: © and ™ not stripped from sanitize_title

#18 @SergeyBiryukov
13 years ago

  • #15768: Unencoded non-breaking spaces in titles don't get sanitized properly

@ampt
13 years ago

unit tests

#19 @ampt
13 years ago

Add unit tests for the latest patch

#20 @nacin
13 years ago

In [18705]:

Strip a number of special characters in sanitize_title_with_dashes on save. Includes quotes (curly, angle), dashes, marks, etc. props SergeyBiryukov. props ampt for the unit tests in [UT438]. see #10797.

#21 @nacin
13 years ago

:-) Thanks ampt! Things look great.

Before patch: Tests: 14, Assertions: 15, Failures: 6.

After patch: OK (14 tests, 22 assertions)

http://unit-tests.trac.wordpress.org/changeset/438

#22 @toscho
13 years ago

  • Cc info@… added

#23 @hd-J
13 years ago

  • Cc jeremy@… added

#24 @nacin
13 years ago

  • Keywords needs-unit-tests removed
  • Resolution set to fixed
  • Status changed from reopened to closed

Think this one is done.

Note: See TracTickets for help on using tickets.