Opened 5 weeks ago
Last modified 5 weeks ago
#63140 new defect (bug)
Unicode Chars (Icons) in the URL are possible, but break WordPress
Reported by: |
|
Owned by: | |
---|---|---|---|
Milestone: | Awaiting Review | Priority: | normal |
Severity: | minor | Version: | 6.7.2 |
Component: | Permalinks | Keywords: | |
Focuses: | Cc: |
Description
My client used Unicode Chars (Icons) in the URL. WordPress doesnt seam to filter them.
So they where saved. Emediatly after, the page didnt work anymore. Even back in draft, the page delivered a white page and not the page content.
I did remove the icons. But page was still broken.
Needed to move page content in a "new" page and save it to reenable it again. Added icons to the URL and the same issue again.
Why are Unicode Icons not filtered from the URL? Can you please apply a filterin mechanism for only valid char in the url? Icons are not supposed to be in the url I think.
Change History (3)
#2
in reply to:
↑ description
@
5 weeks ago
Replying to Stefan M.:
My client used Unicode Chars (Icons) in the URL. WordPress doesnt seam to filter them.
So they where saved. Emediatly after, the page didnt work anymore. Even back in draft, the page delivered a white page and not the page content.
I did remove the icons. But page was still broken.
Needed to move page content in a "new" page and save it to reenable it again. Added icons to the URL and the same issue again.
Why are Unicode Icons not filtered from the URL? Can you please apply a filterin mechanism for only valid char in the url? Icons are not supposed to be in the url I think.
In WordPress, Unicode characters (including icons and emojis) are not automatically filtered from URLs (post slugs) because:
- WordPress Allows Unicode in URLs for Internationalization
WordPress supports multilingual slugs to accommodate non-English languages (e.g., Japanese, Arabic, Cyrillic).
Unicode is essential for SEO and accessibility in non-Latin character-based languages.
- No Built-in Restriction on Special Unicode Characters
While WordPress sanitizes URLs using sanitize_title(), it does not explicitly remove all Unicode symbols, only certain special characters.
Some symbols might pass through if they don’t match WordPress’s default filtering rules.
- Some Unicode Characters Can Break URLs
Certain Unicode characters (like icons or control characters) may cause issues with browsers, servers, or plugins.
If a theme or plugin doesn’t properly handle encoded URLs, it could result in broken pages or white screens (as you experienced).
Solution: Apply a Custom Filter
you can restrict unwanted Unicode characters in slugs by adding this custom function in functions.php
function filter_unicode_from_slug($slug) {
Remove all non-alphanumeric characters except dashes and underscores
$slug = preg_replace('/[\p{L}\p{N}_-]+/u', , $slug);
return sanitize_title($slug);
}
add_filter('sanitize_title', 'filter_unicode_from_slug', 10, 1);
This ensures that only valid letters, numbers, dashes, and underscores remain in URLs. You can adjust the regex pattern to allow or disallow specific characters as needed.
#3
@
5 weeks ago
@stefan-m,
Thanks for reporting this issue. I was able to reproduce the problem.
This is not considered good practice. According to Google's recommendations (https://developers.google.com/search/docs/crawling-indexing/url-structure#:~:text=https%3A//example.com/%F0%9F%A6%99%E2%9C%A8), having emojis in URLs is considered bad practice since they are non-ASCII characters, and non-ASCII characters generally aren't recommended in URLs.
However, many sites still have emojis in their URLs despite these recommendations.
I'll wait for the component maintainers to decide whether this behavior is intentional or should be fixed.
PS: I created the ticket, which reports to @stefan-m but I'm as @stefan-m-1 logged in. Have a completly diffrent image. So it seams a weard security bug?!?
So I post in a guys face, I don't know and did login 5 minutes ago with my login