WordPress.org

Make WordPress Core

Opened 5 years ago

Closed 5 years ago

#37761 closed defect (bug) (duplicate)

wp_encode_emoji misses ๐Ÿ†‘ (U+1F191)

Reported by: thrica Owned by:
Milestone: Priority: normal
Severity: normal Version:
Component: Emoji Keywords:
Focuses: Cc:

Description

I have code that enters tweets into both a Tweets table and a transient. I'd neglected to convert wp_options to utf8mb4 โ€“ย which hadn't been a problem until this character slipped through wp_encode_emoji, so everything's been reduced to fiery wreckage now. Actually it just failed silently, but I did max out my allowed calls to the Twitter API.

It succeeded in the utf8mb4 Tweets table though, which is how I know it didn't convert that character.

I see in the docs that wp_encode_emoji supports up to Unicode 7, but this character is part of the Unicode 6 spec. The hex bytes are F0 9F 86 91, so it's getting missed by the \xF0\x9F[\x85-\x88][\xA6-\xBF] regex line (wp-includes/formatting.php, line 5033) which is supposed to catch the Enclosed Characters block. I suppose this means there are a few others between 86 91 and 86 A6 it's missing too.

Change History (1)

#1 @pento
5 years ago

  • Milestone Awaiting Review deleted
  • Resolution set to duplicate
  • Status changed from new to closed
  • Version 4.6 deleted

Thanks for the report, @thrica!

We're currently looking at how to tackle this in #35293.

Note: See TracTickets for help on using tickets.