Opened 8 years ago
Closed 8 years ago
#37761 closed defect (bug) (duplicate)
wp_encode_emoji misses 🆑 (U+1F191)
Reported by: | thrica | Owned by: | |
---|---|---|---|
Milestone: | Priority: | normal | |
Severity: | normal | Version: | |
Component: | Emoji | Keywords: | |
Focuses: | Cc: |
Description
I have code that enters tweets into both a Tweets table and a transient. I'd neglected to convert wp_options to utf8mb4 – which hadn't been a problem until this character slipped through wp_encode_emoji, so everything's been reduced to fiery wreckage now. Actually it just failed silently, but I did max out my allowed calls to the Twitter API.
It succeeded in the utf8mb4 Tweets table though, which is how I know it didn't convert that character.
I see in the docs that wp_encode_emoji supports up to Unicode 7, but this character is part of the Unicode 6 spec. The hex bytes are F0 9F 86 91, so it's getting missed by the \xF0\x9F[\x85-\x88][\xA6-\xBF] regex line (wp-includes/formatting.php, line 5033) which is supposed to catch the Enclosed Characters block. I suppose this means there are a few others between 86 91 and 86 A6 it's missing too.
Thanks for the report, @thrica!
We're currently looking at how to tackle this in #35293.