Make WordPress Core

Opened 18 years ago

Closed 16 years ago

#3206 closed defect (bug) (duplicate)

strip initial exclamation and question marks in permalinks

Reported by: pandem's profile pandem Owned by:
Milestone: Priority: normal
Severity: minor Version: 2.0.4
Component: General Keywords: formatting i18n
Focuses: Cc:

Description

These two symbols ¡ and ¿ are very common in Spanish titles, but they are not stripped out for the post slug. A similar bug has been reported and patched for 2.0.5 (http://trac.wordpress.org/ticket/2735), so this one could probably be fixed for the same release.

Change History (14)

#1 @pandem
18 years ago

  • Keywords formatting date i18n added
  • Version set to 2.0.4

#2 @pandem
18 years ago

  • Keywords date removed

#3 @Nazgul
17 years ago

  • Milestone set to 2.4 (future)

#4 @foolswisdom
17 years ago

From 4636, %c2% is left in the permalink in place of ¿

#5 @pishmishy
17 years ago

  • Milestone 2.5 deleted
  • Resolution set to fixed
  • Status changed from new to closed

Going by foolswisdom's comment this can be closed.

#6 @Nazgul
17 years ago

  • Milestone set to 2.5

#7 @lloydbudd
17 years ago

  • Milestone changed from 2.5 to 2.7
  • Resolution fixed deleted
  • Status changed from closed to reopened

Based on reading #2735, I think the ticket suggests that these should be stripped a together, and this issue isn't fixed.

#8 @melado87
17 years ago

The problem is that the ¡ and ¿ characters should be stripped out, just like the ! and ? already are. These characters are used in Spanish to start a question or an exclamation.

#9 @pishmishy
17 years ago

Should we change the behavior to white list allowed characters rather than playing whack-a-mole every time someone points out a character that's 'undesirable' in URLs?

#10 @ryan
17 years ago

Wouldn't we have to whitelist most of UTF-8? That's a big list. :-)

There's also the problem that changing what is stripped will break existing slugs made using the old code.

#11 @pishmishy
17 years ago

I hoped I'd get that response ;-) I don't think the whitelist needs to be that large. To work, whitelist would limit the slug to just (see RFC 3986 Appendix A)

ALPHA / DIGIT / "-" / "." / "_" / "~"

I'm not sure that'd be well received but if we could find a way to make it work.
I think we have three options.

  • Do nothing. This seems inconsistent, we have done something for other characters.
  • Whitelist the small set of characters that won't be percent encoded according to RFC 3986
  • Whack-a-mole filtering of characters as and when people ask for it.

#12 @westi
16 years ago

  • Milestone changed from 2.7 to 2.9

Pushing to 2.9

Changing the list of allowable chars breaks old slugs.

There are a number of tickets for this issue for different characters.

We need someone to find all the relavent tickets and come up with a scheme for handling old slugs.

#13 @mrmist
16 years ago

Also related #5554 #4739 vaguely #6106 #1762 #4328 #6973

#14 @Denis-de-Bernardy
16 years ago

  • Milestone 2.9 deleted
  • Resolution set to duplicate
  • Status changed from reopened to closed

Merging into #9591.

Note: See TracTickets for help on using tickets.