Make WordPress Core

Opened 5 weeks ago

Last modified 5 weeks ago

#63140 new defect (bug)

Unicode Chars (Icons) in the URL are possible, but break WordPress

Reported by: stefan-m's profile Stefan M. Owned by:
Milestone: Awaiting Review Priority: normal
Severity: minor Version: 6.7.2
Component: Permalinks Keywords:
Focuses: Cc:

Description

My client used Unicode Chars (Icons) in the URL. WordPress doesnt seam to filter them.

So they where saved. Emediatly after, the page didnt work anymore. Even back in draft, the page delivered a white page and not the page content.
I did remove the icons. But page was still broken.

Needed to move page content in a "new" page and save it to reenable it again. Added icons to the URL and the same issue again.

Why are Unicode Icons not filtered from the URL? Can you please apply a filterin mechanism for only valid char in the url? Icons are not supposed to be in the url I think.

Change History (3)

#1 @Stefan M.
5 weeks ago

PS: I created the ticket, which reports to @stefan-m but I'm as @stefan-m-1 logged in. Have a completly diffrent image. So it seams a weard security bug?!?

So I post in a guys face, I don't know and did login 5 minutes ago with my login

Last edited 5 weeks ago by Stefan M. (previous) (diff)

#2 in reply to: ↑ description @tusharaddweb
5 weeks ago

Replying to Stefan M.:

My client used Unicode Chars (Icons) in the URL. WordPress doesnt seam to filter them.

So they where saved. Emediatly after, the page didnt work anymore. Even back in draft, the page delivered a white page and not the page content.
I did remove the icons. But page was still broken.

Needed to move page content in a "new" page and save it to reenable it again. Added icons to the URL and the same issue again.

Why are Unicode Icons not filtered from the URL? Can you please apply a filterin mechanism for only valid char in the url? Icons are not supposed to be in the url I think.

In WordPress, Unicode characters (including icons and emojis) are not automatically filtered from URLs (post slugs) because:

  1. WordPress Allows Unicode in URLs for Internationalization

WordPress supports multilingual slugs to accommodate non-English languages (e.g., Japanese, Arabic, Cyrillic).
Unicode is essential for SEO and accessibility in non-Latin character-based languages.

  1. No Built-in Restriction on Special Unicode Characters

While WordPress sanitizes URLs using sanitize_title(), it does not explicitly remove all Unicode symbols, only certain special characters.
Some symbols might pass through if they don’t match WordPress’s default filtering rules.

  1. Some Unicode Characters Can Break URLs

Certain Unicode characters (like icons or control characters) may cause issues with browsers, servers, or plugins.
If a theme or plugin doesn’t properly handle encoded URLs, it could result in broken pages or white screens (as you experienced).

Solution: Apply a Custom Filter

you can restrict unwanted Unicode characters in slugs by adding this custom function in functions.php

function filter_unicode_from_slug($slug) {

Remove all non-alphanumeric characters except dashes and underscores
$slug = preg_replace('/[\p{L}\p{N}_-]+/u', , $slug);
return sanitize_title($slug);

}
add_filter('sanitize_title', 'filter_unicode_from_slug', 10, 1);
This ensures that only valid letters, numbers, dashes, and underscores remain in URLs. You can adjust the regex pattern to allow or disallow specific characters as needed.

#3 @im3dabasia1
5 weeks ago

@stefan-m,

Thanks for reporting this issue. I was able to reproduce the problem.

This is not considered good practice. According to Google's recommendations (https://developers.google.com/search/docs/crawling-indexing/url-structure#:~:text=https%3A//example.com/%F0%9F%A6%99%E2%9C%A8), having emojis in URLs is considered bad practice since they are non-ASCII characters, and non-ASCII characters generally aren't recommended in URLs.

However, many sites still have emojis in their URLs despite these recommendations.

I'll wait for the component maintainers to decide whether this behavior is intentional or should be fixed.

Last edited 5 weeks ago by im3dabasia1 (previous) (diff)
Note: See TracTickets for help on using tickets.