Opened 16 years ago
Closed 13 years ago
#3843 closed defect (bug) (duplicate)
Smart quote apostrophe ’ results in a permalink URL with %e2%80%99
Reported by: |
|
Owned by: |
|
---|---|---|---|
Milestone: | Priority: | normal | |
Severity: | minor | Version: | 2.2 |
Component: | Permalinks | Keywords: | has-patch slug permalink dev-feedback |
Focuses: | Cc: |
Description
Smart quote apostrophe ’ results in a permalink URL (slug) with %e2%80%99
ENV: WP trunk r4915
smart quote apostrophe ’
Mac shortcut: Using Shift - Option - ]
ADDITIONAL DETAILS
My guess is that a solution should identify allowed characters, translated to hyphen -, and strip all the others.
Attachments (2)
Change History (27)
#2
@
16 years ago
Solution would have to deal with this case specifically. Note that the URL, while ugly, is functional. Also note that in 2.1, people should be able to edit their post slug and have the old one redirect to the current one.
#3
@
16 years ago
Just a little clarification. The function being used to create the slug is sanitize_title_with_dashes in wp-includes/formatting.php
The sequence of events is currently:
1) Post title becomes the slug candidate
2) Accents are removed (replaced by un-accented letters)
3) Characters that still look like they are UTF-8 are encoded with utf8_uri_encode into octets (%e2, etc.) (this is what is creating the reported behavior)
4) HTML entities and any character except letters, numbers, underscores, spaces, octets, and hyphens are removed (this is where other punctuation is removed)
5) Spaces are turned into hyphens, and whole thing is lower-cased
So... to fix this, would have to add step 2.5:
2.5: Translate into hyphens, or remove (more consistent with what happens to other punctuation), a specific list of special (but common) punctuation characters.
Questions:
a) Is this worth doing, considering that the current behavior makes a usable slug, and that you can always edit your slug by hand if you want to?
b) If it is worth doing, what should the list of special punctuation characters be, and should they be removed or translated into hyphens?
#7
@
16 years ago
Well, we strip *regular* quotes out, but not fancy quotes. I think this is really not going to be fixed easily -- we can strip out UTF-8 quotes, but what about other encodings?
#9
@
15 years ago
- Resolution wontfix deleted
- Status changed from closed to reopened
Please can you leave a comment explaining why you've closed the ticket.
#11
@
15 years ago
- Priority changed from low to normal
This patch should fix the problem - we were treating all unicode as equal - when we should have been defining the different categories and removing the unicode characters relevant to punctuation, etc.
This patch simply attaches onto the other sanitize_title functions and will probably need to be integrated more fully in the future. As for now, it works great for me on all the test cases I threw at it.
In the future, when all browsers support full unicode characters in the URL shouldn't we not be converting them at all? ;)
#12
@
15 years ago
- Milestone changed from 2.7 to 2.6
- Owner changed from anonymous to ryan
- Status changed from reopened to new
#15
@
15 years ago
Any changes to the sanitizer will lead to 404s for slugs made with the old sanitizer.
#20
@
15 years ago
- Milestone changed from 2.7 to 2.9
As with #3206 i am pushing this out we need a single solution to the whole mess ;-)
#22
@
14 years ago
- Milestone 2.9 deleted
- Resolution set to duplicate
- Status changed from new to closed
Merging into #9591.
This is proper behavior. The curly quote isn't plaintext -- it's a symbol and has to be translated. The same is for other UTF-8 symbols such as Chinese characters (some other bug was about that) -- they are and should be turned into URL-safe entities.