Make WordPress Core

Opened 5 years ago

Last modified 8 weeks ago

#44958 assigned defect (bug)

&nbsp character in title generates a permalink (and slug) with space

Reported by: ace2_heart's profile ace2_heart Owned by:
Milestone: 6.5 Priority: normal
Severity: major Version: 4.8
Component: Permalinks Keywords: needs-testing has-patch has-testing-info has-screenshots
Focuses: Cc:

Description

If a title has an &nbsp character copy pasted or if the permalink has a &nbsp copy pasted
It will generate a permalink and slug with an space breaking some elements on the site

I generated an empty wordpress install without any plugins to test it, and it has the same problem

Attachments (3)

5aeaf99417f517d3b1c9bb120a3713a8.gif (144.2 KB) - added by audrasjb 5 years ago.
Works for me
44958.diff (825 bytes) - added by Presskopp 5 months ago.
oopsi.gif (1.6 MB) - added by Presskopp 3 months ago.

Download all attachments as: .zip

Change History (29)

#1 follow-up: @audrasjb
5 years ago

  • Keywords reporter-feedback added

Hi @ace2_heart thanks for the ticket and welcome to WordPress Trac,

I can't reproduce the issue on a fresh install (see screen capture above). I tested it in the permalink edition and in the title field and both are correctly escaping the non-breaking space. Can you provide us some further information to reproduce the bug please?

Cheers,

Jb

#2 @SergeyBiryukov
5 years ago

Hi @ace2_heart, thanks for the ticket!

  is supposed to be converted to a hyphen by sanitize_title_with_dashes().

Are you sure the issue still happens on a clean install with all plugins disabled and a default theme (Twenty Seventeen) activated?

Last edited 5 years ago by SergeyBiryukov (previous) (diff)

#3 @ace2_heart
5 years ago

Hi, I created an empty install here, you can access it

  • user: miguelesquirol
  • password: miguelesquirol

https://bug-testing-miguelesquirol.c9users.io/wp-ad- min

These are the pages created

The parmalink are as following:

/e -securite/
/ -test/
Last edited 5 years ago by ace2_heart (previous) (diff)

#4 in reply to: ↑ 1 @ace2_heart
5 years ago

Replying to audrasjb:

I just saw the demo you posted, instead of using the   text on the title, tried to copy/paste the actual space character.

e -securite

Hi @ace2_heart thanks for the ticket and welcome to WordPress Trac,

I can't reproduce the issue on a fresh install (see screen capture above). I tested it in the permalink edition and in the title field and both are correctly escaping the non-breaking space. Can you provide us some further information to reproduce the bug please?

Cheers,

Jb

#5 follow-up: @audrasjb
5 years ago

Ok thanks for your answer.

So the issue happens with control marks like %e2%80%. Indeed, I'm able to reproduce this specific issue.

By the way, you are running WP 4.8.7 which is quite an old version. But the issue also happens in 4.9.8.

Thanks,

Jb

#6 in reply to: ↑ 5 @ace2_heart
5 years ago

So it's a really a bug? It's my first time submitting one.

Replying to audrasjb:

Ok thanks for your answer.

So the issue happens with control marks like %e2%80%. Indeed, I'm able to reproduce this specific issue.

By the way, you are running WP 4.8.7 which is quite an old version. But the issue also happens in 4.9.8.

Thanks,

Jb

Last edited 5 years ago by ace2_heart (previous) (diff)

#7 @audrasjb
5 years ago

I'd say it's more like an edge case, but yep, thank you for your ticket :)

#8 @tsewlliw
5 years ago

I'm hoping to submit a patch for this ticket in the next couple of days, I have seen it pop up in other places with other characters with a clear case for being sanitized out of URIs, such as the unicode line separator.

#11 @Presskopp
5 months ago

  • Keywords has-patch added

Here's a patch to get around any %e2%80 constructions like:

%e2%80
%e2%80test
%e2%80%
%e2%80%93
%e2%80%94
%e2%80%943
%e2%80%3
%e2%80%94slug
my%e2%80%94slug
my%e28094slug
my%e2%80%A8slug
...

While I think the regex itself is quite robust, this needs testing to fit well into the sanitize_title_with_dashes function, but I hope it helps on the way

@Presskopp
5 months ago

#12 @audrasjb
5 months ago

  • Keywords has-testing-info added; reporter-feedback removed
  • Milestone changed from Awaiting Review to 6.4

Thanks for the patch! Moving for 6.4 consideration.

This ticket was mentioned in Slack in #core-test by oglekler. View the logs.


3 months ago

#14 @zunaid321
3 months ago

Reproduction Report

This report validates that the issue cannot be reproduced.

Environment

  • OS: Windows 11 (22H2)
  • Web Server: nginx/1.25.1
  • PHP: 7.4.33
  • WordPress: 6.4-alpha-56267-src
  • Browser: Chrome Version 113.0.5672.126 (Official Build) (64-bit)
  • Theme: Twenty Twenty-Three

Actual Results

  • ❌ Was not able to reproduce the issue. Even after modifying the title with a space, the slug seems to be okay.

Supplemental Artifacts

Image: https://prnt.sc/VZnSqmEl8IHA

#15 @iammehedi1
3 months ago

Test Report

This report validates that the indicated issue can not be reproduced

Environment

  • OS: Windows 11 Pro (22H2)
  • Web Server: nginx/1.25.1
  • PHP: 7.4.33
  • WordPress: 6.4-alpha-56267-src
  • Browser: Chrome Version 116.0.5845.98 (Official Build) (64-bit)
  • Theme: Twenty Twenty-Three

Actual Results

  • ❌ Was not able to reproduce the issue. Tried to edit the permalinks as guided, found nothing.

Supplemental Artifacts

https://i.ibb.co/cCCCk6d/image.png

#16 follow-up: @Presskopp
3 months ago

it can be reproduced, but not with a space alone, see comment No 5

Last edited 3 months ago by Presskopp (previous) (diff)

#17 in reply to: ↑ 16 @zunaid321
3 months ago

I did check that comment but didn't quite understand how to test it out. Can you please clarify how you have tested it?

Replying to Presskopp:

it can be reproduced, but not with a space alone, see comment No 5

@Presskopp
3 months ago

#18 @Presskopp
3 months ago

  • Keywords has-screenshots added

This ticket was mentioned in Slack in #core by zunaid321. View the logs.


3 months ago

#20 @huzaifaalmesbah
3 months ago

Test Report

I Test 44958.diff

Environment

OS: macOS m1
WordPress 6.4-alpha-56267-src
PHP 7.4.33
nginx/1.25.2
MySQL 5.7.43
Browser: Chrome 116.0.5845.140
Theme: Twenty Twenty v2.3
Active Plugins: No plugins activated.

Before Applying Patch

https://i.ibb.co/98stRpF/Screenshot-2023-09-16-at-12-26-45-AM.png

After Applying Patch

https://i.ibb.co/Ks2Ckxr/Screenshot-2023-09-16-at-12-27-11-AM.png

#21 @oglekler
2 months ago

I checked all cases that are mentioned in the comment#11 and the patch is working as expected, but I cannot evaluate that these are all possible cases. This patch is preventing invisible symbols to be shown in the URL, and the most accurate way to test it would have been to copy-paste text from somewhere with this invisible symbols inside while having the whole list of these symbols.

This ticket was mentioned in Slack in #core by nicolefurlan. View the logs.


8 weeks ago

#24 @dmsnell
8 weeks ago

the regex is replacing %e280 and then even more sequences with an optional percent sign but none of those are valid URL-encodings. do we have any examples of any system producing those outputs? At a minimum that should be interpreted/rendered as �80 but it would be valid to simply ignore decoding / leave it as %e280.

given that %e2 by itself is invalid UTF-8/invalid URL encoding, it's a bit easier to discuss here, but I'd be worried about consuming too much input because it looks like it shouldn't be there and then further corrupting data that's otherwise fine. in other words, if we don't have any examples of systems producing output like %e280 then it might be a good call for us to limit unintentional breakage by only replacing %e2%80 and successive sequences that all carry the percent-sign.

#25 @nicolefurlan
8 weeks ago

  • Milestone changed from 6.4 to 6.5

This ticket was discussed in today's bug scrub.

We're going to bump this to 6.5.

Per #comment:21, it would be great if we could also find more test cases so that we can be more confident in this fix.

Props to @hellofromTonya :)

#26 @dmsnell
8 weeks ago

After some additional testing I was unable to reproduce through the block editor, but when attempting to paste some of these characters into the title field, another bug appeared, which I'll file in the Gutenberg repository (the contents of the title duplicate).

With the Classic Editor plugin I was able to reproduce, and I assume that API calls and manual calls from within PHP would have similarly exhibited the broken behavior.

In deeper review of this patch I do think it's worth taking out the "also %e280 to be on the safe side," though on a second thought it's not unrealistic to keep it in there, but for the different reason that "%e280" only appears when something else is already broken. This is more about the question of whether we want to make a bad situation potentially worse or leave it only as bad as it is; I generally find that leaving broken things the way they are leads to less confusion and corruption.

---

If we want to explore the broader issue for 6.5 I will volunteer my time to look into it, as the function is already fairly ad-hoc and this particular ticket seems part of the broader problem that we're hitting specific cases instead of taking a systematic or formal approach in handling the slug creation.

I've started exploring a PCRE-based approach in https://github.com/WordPress/wordpress-develop/pull/5466 that relies on Unicode character classes. If that approach works, it could simplify the function while boosting its reliability and maintainability. There are thankfully a fair number of unit tests and there's a fair bit of work to do there, but it seems tractable that we could collect a lot of currently-scattered logic and cover many bases in one go.

Note: See TracTickets for help on using tickets.