Make WordPress Core

Opened 6 weeks ago

Last modified 6 weeks ago

#62948 new defect (bug)

Posts tagged with complex Emoji can't be found

Reported by: edent's profile edent Owned by:
Milestone: Awaiting Review Priority: normal
Severity: normal Version: 6.7.2
Component: Posts, Post Types Keywords:
Focuses: Cc:

Description

If a post is tagged with 🏳️‍⚧️ - it cannot be found at tag/%F0%9F%8F%B3%EF%B8%8F%E2%80%8D%E2%9A%A7%EF%B8%8F

Some simpler emoji *do* work as tags. For example https://shkspr.mobi/blog/tag/😊/

Moderately complex flag emoji also work. For example https://shkspr.mobi/blog/tag/🏴‍☠️/ and https://shkspr.mobi/blog/tag/%f0%9f%8f%b3%ef%b8%8f%f0%9f%8c%88/

My backend database has an encoding of utf8mb4_unicode_520_ci (10.6.20-MariaDB).

Change History (4)

#1 @sainathpoojary
6 weeks ago

Hi @edent! Thanks for bringing up the issue. I tried to reproduce the issue with emoji tags but wasn't able to replicate the problem you're experiencing.

I tested the following:

  1. Created test posts with emoji tags including 🏳️‍⚧️, 😊, and 🏴‍☠️
  2. Verified the tags appear correctly on posts
  3. Accessed the tag archive URLs:

/tag/🏳️‍⚧️
/tag/😊
/tag/🏴‍☠️

All URLs worked and showed the tagged posts correctly. Could you share more information like

  1. Are you using any plugins that might affect URL handling or taxonomies?
  2. Does this happen with a default theme like Twenty Twenty-Four?

Additional information would help narrow down what might be different in your setup causing this issue.

Video: https://rioudcpuyg.ufs.sh/f/PL8E4NiPUWyO7432pdv9AWOSNT8iuxYqzcvCVIEKbQmF3njP

Environment

  • WordPress: 6.8-alpha-59274-src
  • PHP: 8.2.27
  • Server: nginx/1.27.3
  • Database: mysqli (Server: 8.4.4 / Client: mysqlnd 8.2.27)
  • Browser: Chrome 132.0.0.0
  • OS: macOS
  • Theme: Twenty Twenty-Five 1.0
  • MU Plugins: None activated
  • Plugins: Test Reports 1.2.0

#2 @edent
6 weeks ago

I think I see the problem. My slug was originally stored as
%f0%9f%8f%b3%ef%b8%8f%e2%80%8d%e2%9a%a7%ef%b8%8f

If I delete the slug and recreate it, I get
%f0%9f%8f%b3%ef%b8%8f%e2%9a%a7%ef%b8%8f

The original is:

  • WAVING WHITE FLAG
  • VARIATION SELECTOR-16
  • ZERO WIDTH JOINER
  • MALE WITH STROKE AND MALE AND FEMALE SIGN
  • VARIATION SELECTOR-16

The newly recreated one (which works) *doesn't* have the Zero Width Joiner.

If you look at the raw SQL table on your test instance, what's the slug's percent-encoded representation?

Thanks

#3 @sainathpoojary
6 weeks ago

Hi @edent,

I checked the raw data in my test instance’s wp_terms table, and here’s what I found:

  • The stored slug for 🏳️‍⚧️ in my database is:
%f0%9f%8f%b3%ef%b8%8f%e2%9a%a7%ef%b8%8f
  • This matches the “working” version you mentioned, without the Zero Width Joiner (%e2%80%8d).

Thanks for bringing this up!

#4 @edent
6 weeks ago

That's why I think this is a bug.

Unicode defines the flag as

1F3F3 FE0F 200D 26A7 FE0F ; RGI_Emoji_ZWJ_Sequence ; transgender flag # E13.0 [1] (🏳️‍⚧️)

Source: https://unicode.org/Public/emoji/13.0/emoji-zwj-sequences.txt

You can see in the emoji test file that not having the ZWJ results in an "unqualified" representation of the emoji.

Note: See TracTickets for help on using tickets.