Opened 5 years ago
Last modified 8 weeks ago
#44958 assigned defect (bug)
  character in title generates a permalink (and slug) with space
Reported by: |
|
Owned by: | |
---|---|---|---|
Milestone: | 6.5 | Priority: | normal |
Severity: | major | Version: | 4.8 |
Component: | Permalinks | Keywords: | needs-testing has-patch has-testing-info has-screenshots |
Focuses: | Cc: |
Description
If a title has an   character copy pasted or if the permalink has a   copy pasted
It will generate a permalink and slug with an space breaking some elements on the site
I generated an empty wordpress install without any plugins to test it, and it has the same problem
Attachments (3)
Change History (29)
#1
follow-up:
↓ 4
@
5 years ago
- Keywords reporter-feedback added
Hi @ace2_heart thanks for the ticket and welcome to WordPress Trac,
I can't reproduce the issue on a fresh install (see screen capture above). I tested it in the permalink edition and in the title field and both are correctly escaping the non-breaking space. Can you provide us some further information to reproduce the bug please?
Cheers,
Jb
#2
@
5 years ago
Hi @ace2_heart, thanks for the ticket!
is supposed to be converted to a hyphen by sanitize_title_with_dashes().
Are you sure the issue still happens on a clean install with all plugins disabled and a default theme (Twenty Seventeen) activated?
#3
@
5 years ago
Hi, I created an empty install here, you can access it
- user: miguelesquirol
- password: miguelesquirol
https://bug-testing-miguelesquirol.c9users.io/wp-ad- min
These are the pages created
The parmalink are as following:
/e -securite/
/ -test/
#4
in reply to:
↑ 1
@
5 years ago
Replying to audrasjb:
I just saw the demo you posted, instead of using the text on the title, tried to copy/paste the actual space character.
e -securite
Hi @ace2_heart thanks for the ticket and welcome to WordPress Trac,
I can't reproduce the issue on a fresh install (see screen capture above). I tested it in the permalink edition and in the title field and both are correctly escaping the non-breaking space. Can you provide us some further information to reproduce the bug please?
Cheers,
Jb
#5
follow-up:
↓ 6
@
5 years ago
Ok thanks for your answer.
So the issue happens with control marks like %e2%80%
. Indeed, I'm able to reproduce this specific issue.
By the way, you are running WP 4.8.7 which is quite an old version. But the issue also happens in 4.9.8.
Thanks,
Jb
#6
in reply to:
↑ 5
@
5 years ago
So it's a really a bug? It's my first time submitting one.
Replying to audrasjb:
Ok thanks for your answer.
So the issue happens with control marks like
%e2%80%
. Indeed, I'm able to reproduce this specific issue.
By the way, you are running WP 4.8.7 which is quite an old version. But the issue also happens in 4.9.8.
Thanks,
Jb
#8
@
5 years ago
I'm hoping to submit a patch for this ticket in the next couple of days, I have seen it pop up in other places with other characters with a clear case for being sanitized out of URIs, such as the unicode line separator.
#11
@
5 months ago
- Keywords has-patch added
Here's a patch to get around any %e2%80
constructions like:
%e2%80 %e2%80test %e2%80% %e2%80%93 %e2%80%94 %e2%80%943 %e2%80%3 %e2%80%94slug my%e2%80%94slug my%e28094slug my%e2%80%A8slug ...
While I think the regex itself is quite robust, this needs testing to fit well into the sanitize_title_with_dashes
function, but I hope it helps on the way
#12
@
5 months ago
- Keywords has-testing-info added; reporter-feedback removed
- Milestone changed from Awaiting Review to 6.4
Thanks for the patch! Moving for 6.4 consideration.
This ticket was mentioned in Slack in #core-test by oglekler. View the logs.
3 months ago
#14
@
3 months ago
Reproduction Report
This report validates that the issue cannot be reproduced.
Environment
- OS: Windows 11 (22H2)
- Web Server: nginx/1.25.1
- PHP: 7.4.33
- WordPress: 6.4-alpha-56267-src
- Browser: Chrome Version 113.0.5672.126 (Official Build) (64-bit)
- Theme: Twenty Twenty-Three
Actual Results
- ❌ Was not able to reproduce the issue. Even after modifying the title with a space, the slug seems to be okay.
Supplemental Artifacts
#15
@
3 months ago
Test Report
This report validates that the indicated issue can not be reproduced
Environment
- OS: Windows 11 Pro (22H2)
- Web Server: nginx/1.25.1
- PHP: 7.4.33
- WordPress: 6.4-alpha-56267-src
- Browser: Chrome Version 116.0.5845.98 (Official Build) (64-bit)
- Theme: Twenty Twenty-Three
Actual Results
- ❌ Was not able to reproduce the issue. Tried to edit the permalinks as guided, found nothing.
Supplemental Artifacts
#16
follow-up:
↓ 17
@
3 months ago
it can be reproduced, but not with a space alone, see comment No 5
#17
in reply to:
↑ 16
@
3 months ago
I did check that comment but didn't quite understand how to test it out. Can you please clarify how you have tested it?
Replying to Presskopp:
it can be reproduced, but not with a space alone, see comment No 5
This ticket was mentioned in Slack in #core by zunaid321. View the logs.
3 months ago
#21
@
2 months ago
I checked all cases that are mentioned in the comment#11 and the patch is working as expected, but I cannot evaluate that these are all possible cases. This patch is preventing invisible symbols to be shown in the URL, and the most accurate way to test it would have been to copy-paste text from somewhere with this invisible symbols inside while having the whole list of these symbols.
This ticket was mentioned in Slack in #core by nicolefurlan. View the logs.
8 weeks ago
#24
@
8 weeks ago
the regex is replacing %e280
and then even more sequences with an optional percent sign but none of those are valid URL-encodings. do we have any examples of any system producing those outputs? At a minimum that should be interpreted/rendered as �80
but it would be valid to simply ignore decoding / leave it as %e280
.
given that %e2
by itself is invalid UTF-8/invalid URL encoding, it's a bit easier to discuss here, but I'd be worried about consuming too much input because it looks like it shouldn't be there and then further corrupting data that's otherwise fine. in other words, if we don't have any examples of systems producing output like %e280
then it might be a good call for us to limit unintentional breakage by only replacing %e2%80
and successive sequences that all carry the percent-sign.
#25
@
8 weeks ago
- Milestone changed from 6.4 to 6.5
This ticket was discussed in today's bug scrub.
We're going to bump this to 6.5.
Per #comment:21, it would be great if we could also find more test cases so that we can be more confident in this fix.
Props to @hellofromTonya :)
#26
@
8 weeks ago
After some additional testing I was unable to reproduce through the block editor, but when attempting to paste some of these characters into the title field, another bug appeared, which I'll file in the Gutenberg repository (the contents of the title duplicate).
With the Classic Editor plugin I was able to reproduce, and I assume that API calls and manual calls from within PHP would have similarly exhibited the broken behavior.
In deeper review of this patch I do think it's worth taking out the "also %e280 to be on the safe side," though on a second thought it's not unrealistic to keep it in there, but for the different reason that "%e280" only appears when something else is already broken. This is more about the question of whether we want to make a bad situation potentially worse or leave it only as bad as it is; I generally find that leaving broken things the way they are leads to less confusion and corruption.
---
If we want to explore the broader issue for 6.5 I will volunteer my time to look into it, as the function is already fairly ad-hoc and this particular ticket seems part of the broader problem that we're hitting specific cases instead of taking a systematic or formal approach in handling the slug creation.
I've started exploring a PCRE-based approach in https://github.com/WordPress/wordpress-develop/pull/5466 that relies on Unicode character classes. If that approach works, it could simplify the function while boosting its reliability and maintainability. There are thankfully a fair number of unit tests and there's a fair bit of work to do there, but it seems tractable that we could collect a lot of currently-scattered logic and cover many bases in one go.
Works for me