WordPress.org

Make WordPress Core

Opened 8 years ago

Closed 5 years ago

#3206 closed defect (bug) (duplicate)

strip initial exclamation and question marks in permalinks

Reported by: pandem Owned by:
Milestone: Priority: normal
Severity: minor Version: 2.0.4
Component: General Keywords: formatting i18n
Focuses: Cc:

Description

These two symbols ¡ and ¿ are very common in Spanish titles, but they are not stripped out for the post slug. A similar bug has been reported and patched for 2.0.5 (http://trac.wordpress.org/ticket/2735), so this one could probably be fixed for the same release.

Change History (14)

comment:1 pandem8 years ago

  • Keywords formatting date i18n added
  • Version set to 2.0.4

comment:2 pandem8 years ago

  • Keywords date removed

comment:3 Nazgul7 years ago

  • Milestone set to 2.4 (future)

comment:4 foolswisdom7 years ago

From 4636, %c2% is left in the permalink in place of ¿

comment:5 pishmishy6 years ago

  • Milestone 2.5 deleted
  • Resolution set to fixed
  • Status changed from new to closed

Going by foolswisdom's comment this can be closed.

comment:6 Nazgul6 years ago

  • Milestone set to 2.5

comment:7 lloydbudd6 years ago

  • Milestone changed from 2.5 to 2.7
  • Resolution fixed deleted
  • Status changed from closed to reopened

Based on reading #2735, I think the ticket suggests that these should be stripped a together, and this issue isn't fixed.

comment:8 melado876 years ago

The problem is that the ¡ and ¿ characters should be stripped out, just like the ! and ? already are. These characters are used in Spanish to start a question or an exclamation.

comment:9 pishmishy6 years ago

Should we change the behavior to white list allowed characters rather than playing whack-a-mole every time someone points out a character that's 'undesirable' in URLs?

comment:10 ryan6 years ago

Wouldn't we have to whitelist most of UTF-8? That's a big list. :-)

There's also the problem that changing what is stripped will break existing slugs made using the old code.

comment:11 pishmishy6 years ago

I hoped I'd get that response ;-) I don't think the whitelist needs to be that large. To work, whitelist would limit the slug to just (see RFC 3986 Appendix A)

ALPHA / DIGIT / "-" / "." / "_" / "~"

I'm not sure that'd be well received but if we could find a way to make it work.
I think we have three options.

  • Do nothing. This seems inconsistent, we have done something for other characters.
  • Whitelist the small set of characters that won't be percent encoded according to RFC 3986
  • Whack-a-mole filtering of characters as and when people ask for it.

comment:12 westi6 years ago

  • Milestone changed from 2.7 to 2.9

Pushing to 2.9

Changing the list of allowable chars breaks old slugs.

There are a number of tickets for this issue for different characters.

We need someone to find all the relavent tickets and come up with a scheme for handling old slugs.

comment:13 mrmist5 years ago

Also related #5554 #4739 vaguely #6106 #1762 #4328 #6973

comment:14 Denis-de-Bernardy5 years ago

  • Milestone 2.9 deleted
  • Resolution set to duplicate
  • Status changed from reopened to closed

Merging into #9591.

Note: See TracTickets for help on using tickets.