Opened 3 months ago
Last modified 4 days ago
#64318 new defect (bug)
↗ should not be replaced by Twemoji
| Reported by: |
|
Owned by: | |
|---|---|---|---|
| Milestone: | 7.0 | Priority: | normal |
| Severity: | normal | Version: | 4.2 |
| Component: | Emoji | Keywords: | has-patch |
| Focuses: | Cc: |
Description
At present, if you type ↗, a unicode glyph, in a WordPress post, it is converted to a Twemoji: ↗️. But those are two different things, very intentionally so, just like → is different from ➡️.
At present if you want to avoid having the ↗ glyph replaced with a twemoji, you either have to disable the set entirely, or add a CSS class, wp-exclude-emoji in order to safeguard the glyph.
From some conversations with others, this replacement of ↗ (and presumably also south east, south west, north west arrows) appears to be a recent change in WordPress, not present in 6.8 and happening sometime between September 3 and now.
Attachments (1)
Change History (20)
#3
@
3 months ago
- Keywords has-patch added
- Version set to trunk
This patch prevents mathematical arrow symbols such as ↦ ↤ ↥ ↧ from being incorrectly converted into Twemoji. These characters belong to the Mathematical Operators block and represent functional or logical notation, not emoji. Replacing them with Twemoji changes the meaning of the text, which is especially problematic in technical, mathematical, or academic content.
The patch introduces:
- PHP safeguard in wp_staticize_emoji() to explicitly skip math arrow characters.
- JavaScript regex fix in wp-emoji.js to exclude the Unicode range U+21A6–U+21AA, ensuring consistent behavior across both backend and frontend.
This restores the expected behavior seen in WordPress 6.8 and earlier, ensuring that math symbols remain unchanged and are not interpreted as emoji.
#4
@
3 months ago
- Keywords needs-patch added; has-patch removed
- Milestone changed from Awaiting Review to 6.9.1
@iflairwebtechnologies I tried applying your patch but it doesn't apply. The patch seems to be applying against a non-existing WordPress codebase. For example, Extended_Pictographic is [not mentioned https://github.com/search?q=repo%3AWordPress%2Fwordpress-develop%20Extended_Pictographic&type=code] in core.
#5
@
3 months ago
From some conversations with others, this replacement of ↗ (and presumably also south east, south west, north west arrows) appears to be a recent change in WordPress, not present in 6.8 and happening sometime between September 3 and now.
I believe this is simply because the version of Twemoji was updated in #64184. Because operating systems don't support all of the latest emoji, this is causing all emoji to be replaced with images by Twemoji. Specifically, in [61134] the test for Emoji compatibility changed from the Splatter emoji to the Hairy creature (Sasquatch!) emoji.
Ideally, Twemoji wouldn't update emoji which the OS/browser already support!
This is surely out of scope for this ticket, but since it is highly unlikely that newer emoji will be used frequently to begin with, an enhancement we could use to avoid needlessly running Twemoji would be to check if the page is using any of the emoji from specific versions, and then dynamically change which characters are used in the emoji test for OS/browser support. With the new template enhancement output buffer in 6.9, this would be doable for classic themes, and it should be doable without output buffering in block themes already.
#6
@
3 months ago
Ideally, Twemoji wouldn't update emoji which the OS/browser already support!
My recollection is that it was decided to replace all emoji to ensure the designs are consistent. For example, so systems using Noto Emoji don't end up with a mixture of colour and black and white designs.
Rather than make changes to the WordPress code base, I think it would be good to open an upstream issue in the Twemoji issue tracker. If it was a design decision to replace the non-emoji arrows glyph then a change to the WordPress code base can be considered; if it was unintentional then it's best to contribute back to the upstream code base.
This ticket was mentioned in PR #10603 on WordPress/wordpress-develop by @Presskopp.
2 months ago
#7
- Keywords has-patch added; needs-patch removed
added more arrow symbol codes to be ignored from being parsed as emoji
Trac ticket: 64318
@wildworks commented on PR #10603:
2 months ago
#8
The emoji.js file is parsed as ES5, but uses ES6 syntax such as const and new Set.
I'm not an expert on this, but is it possible to use ES6 syntax in core JS code now?
@westonruter commented on PR #10603:
2 months ago
#9
Yes, we're using this now in some files. For example in:
It was only made possible relatively recently since r56247 when jsvalidate was removed, since that tool didn't support ES6+ syntax.
@wildworks commented on PR #10603:
2 months ago
#10
#11
@
2 months ago
- Version changed from 6.9 to 4.2
I'd prefer not to include this in 6.9.1 as the issue was not introduced in the 6.9 release cycle.
As Twemoji is generally quicker at releasing new Unicode versions than operating systems, it becomes apparent each time WordPress upgrades but it's unrelated to 6.9 none-the-less. It's been around since Twemoji was introduced in WP 4.2
#12
@
4 weeks ago
- Milestone changed from 6.9.1 to 7.0
I agree with @peterwilsoncc that this is not appopriate for 6.9.1. As such, I am moving this to the 7.0 milestone
@peterwilsoncc commented on PR #10603:
5 days ago
#13
@Presskopp As I mentioned on the ticket, I think it's best to open an upstream issue to the twemoji library as an initial step before modifying the WordPress code base. If replacing these characters is a design decision by the team we can make these changes.
@Presskopp commented on PR #10603:
5 days ago
#14
@peterwilsoncc I don’t think I’m in a position to make that design decision, nor do I have a clear idea what a concrete upstream issue should propose.
#15
in reply to:
↑ description
;
follow-up:
↓ 17
@
4 days ago
Replying to Joen:
At present if you want to avoid having the ↗ glyph replaced with a twemoji, you either have to disable the set entirely, or add a CSS class,
wp-exclude-emojiin order to safeguard the glyph.
A third way to avoid having the glyph replaced with a twemoji image is to use the text presentation selector - the character U+FE0E VARIATION SELECTOR-15 (VS15).
So if you have the sequence U+2197 (NORTH EAST ARROW) followed by U+FE0E, Twemoji will (correctly) avoid replacing it with an image.
If you really want the text presentation, this is probably the best way to do it.
@siliconforks commented on PR #10603:
4 days ago
#16
@Presskopp As I mentioned on the ticket, I think it's best to open an upstream issue to the twemoji library as an initial step before modifying the WordPress code base. If replacing these characters is a design decision by the team we can make these changes.
I don't think this is really a bug in the Twemoji library, because I believe that the standard allows implementations to choose whether to use the text presentation or the emoji presentation here. See https://www.unicode.org/reports/tr51/#Presentation_Style
#17
in reply to:
↑ 15
;
follow-up:
↓ 18
@
4 days ago
Replying to siliconforks:
So if you have the sequence U+2197 (NORTH EAST ARROW) followed by U+FE0E, Twemoji will (correctly) avoid replacing it with an image. ¶ If you really want the text presentation, this is probably the best way to do it.
The problem is existing content out there that doesn't have U+FE0E, right? Existing content expecting the text presentation unexpectedly gets the emoji presentation when upgradinging Twemoji. It would be a good idea for newly-inserted characters to add this U+FE0E, but I presume only for characters inserted via some integrated symbol inserter (like TinyMCE had). If the character were inserted by the operating system's character picker, then it wouldn't include this U+FE0E, presumably.
So would a new filter be required for the_content to inject this U+FE0E after the set of characters that we determine? But then this wouldn't really be any different than the JS-based exclusion? Both are programmatic means to the same end.
#18
in reply to:
↑ 17
@
4 days ago
Replying to westonruter:
The problem is existing content out there that doesn't have U+FE0E, right? Existing content expecting the text presentation unexpectedly gets the emoji presentation when upgradinging Twemoji.
Yes, but it could also be the other way around - the user might have originally wanted the graphical emoji presentation, and now it could get changed to the simple text presentation.
So would a new filter be required for
the_contentto inject this U+FE0E after the set of characters that we determine? But then this wouldn't really be any different than the JS-based exclusion? Both are programmatic means to the same end.
That has the same problem - there's no way to be certain what the user actually intended. What if the user actually wanted the graphical emoji presentation? Then injecting U+FE0E would be wrong.
The JavaScript solution in PR #10603 (to avoid using Twemoji for certain characters) might be reasonable as a "best guess" for what the user wanted. But ultimately I don't think this is something that can entirely be fixed in WordPress (or in Twemoji). Ideally the operating system's character picker (or wherever the user got the character from) should insert the correct Unicode sequence, either U+2197 U+FE0E (for the simple text presentation) or U+2197 U+FE0F (for the graphical emoji), depending on what the user actually chose.
@peterwilsoncc commented on PR #10603:
4 days ago
#19
@Presskopp The upstream proposal would be to ignore these characters in the Twemoji library.
@siliconforks That's why I think we need to find out if it's a design decision from the Twemoji team. If it is then we can make change here but if they consider it a bug then it's better not to add additional code to WordPress.
Related: https://github.com/WordPress/wporg-main-2022/pull/39