#36384 closed defect (bug) (duplicate)
Percent sign breaks the slugs (sanitize_title > remove_accents bug)
Reported by: |
|
Owned by: | |
---|---|---|---|
Milestone: | Priority: | normal | |
Severity: | normal | Version: | 4.4.2 |
Component: | Formatting | Keywords: | |
Focuses: | administration | Cc: |
Description
Hi everyone. Just noticed that the remove_accents()
function (that's used in the sanitize_title()
function) treats the percent sign (followed with numbers) as "looks like this is URL-encoded, let me decode it".
In the Turkish language (and apparently in Persian as well, according to Wikipedia) the percent sign precedes the numbers instead of following them. Combine this information with remove_accents()
's bug and this title:
My body is %50 muscle, %20 fat and %100 sexy
produces this slug:
my-body-is-P-muscle- -fat-and-0-sexy
Here's what's scary, though: The "-and-0-sexy
" part has a hidden UTF-8 character (equivalent to %10
), breaking the post URL altogether:
(You can get the same results with any online URL decoder, by the way.)
According to my searches, this issue came up once more in Trac (#32462) but it was thought it's an IIS-related situation and has never been resolved. Now that we know it's remove_accents()
's fault, do you think we can fix it?
I'm no expert on PHP, but I believe before dealing with the characters, the remove_accents()
function could just remove the percent characters (or replace it with __('percent')
, but removing would make more sense) before dealing with all the other characters.
PS: Although I'm not sure it was the same problem, this issue about percent signs in slugs seems to have been fixed over 10 years ago (#569) but kind of resurfaced again. They solved it by removing the percent characters before dealing with all the other characters. (Old people know the best.)
Cheers,
Barış Ünver
Hi @barisunver, welcome to Trac!
Thanks for the report, we're already tracking this issue in #3329 and #25021.