WordPress.org

Make WordPress Core

Opened 5 years ago

Closed 5 years ago

Last modified 5 years ago

#31328 closed defect (bug) (fixed)

Emoji in a slug fails to display

Reported by: kraftbj Owned by: pento
Milestone: 4.2 Priority: normal
Severity: normal Version: 4.2
Component: Formatting Keywords:
Focuses: administration Cc:

Description

When using emoji in a place that generates a slug (category name, post title, etc), if the emoji is saved to the database as HTML entities, it will fail to display in the UI.

Example: Save a draft/publish with post title is "WordPress {sparkling_heart emoji} Emoji".
URL/slug will be "wordpress-%f0%9f%92-emoji"
On post.php, the slug will not display: https://cloudup.com/cASF4rxUNhF

Second example: Save a new category with the name being an emoji.
Slug is saved to the db in the HTML entity format ("%f0%9f%92").
On edit-terms.php, no slug will be displayed.

Attachments (11)

31328.patch (1.2 KB) - added by SergeyBiryukov 5 years ago.
31328.2.patch (2.3 KB) - added by SergeyBiryukov 5 years ago.
31328.edit-slug.png (7.4 KB) - added by SergeyBiryukov 5 years ago.
31328.diff (2.1 KB) - added by pento 5 years ago.
31328.2.diff (6.4 KB) - added by pento 5 years ago.
31328.3.diff (7.8 KB) - added by pento 5 years ago.
31328.4.diff (8.7 KB) - added by boonebgorges 5 years ago.
31328.5.diff (11.9 KB) - added by pento 5 years ago.
31328.3.patch (4.3 KB) - added by iseulde 5 years ago.
31328.4.patch (4.3 KB) - added by iseulde 5 years ago.
31328.6.diff (7.8 KB) - added by boonebgorges 5 years ago.

Download all attachments as: .zip

Change History (37)

#1 @SergeyBiryukov
5 years ago

  • Component changed from Administration to Posts, Post Types
  • Focuses administration added
  • Keywords needs-patch added
  • Milestone changed from Awaiting Review to 4.2

#2 @SergeyBiryukov
5 years ago

  • Component changed from Posts, Post Types to Formatting
  • Keywords has-patch needs-unit-tests added; needs-patch removed

31328.patch fixes utf8_uri_encode() to support 4-byte sequences.

This resolves the issue on display, but still breaks on editing, see 31328.edit-slug.png.

31328.2.patch (based on a Stack Overflow thread) also strips Emoji in sanitize_title_with_dashes(), similarly to [18705] and [20686].

#3 @kraftbj
5 years ago

.2 works for me in both areas.

Noting some oddity with different emoji (slug sometimes emoji, sometimes 197): https://cloudup.com/cT0d9frbDEa but unsure of pattern or directly under this ticket. Pinged @pento too.

This ticket was mentioned in Slack in #core by kraft. View the logs.


5 years ago

@pento
5 years ago

#5 @pento
5 years ago

31328.diff builds on top of 31328.patch, adding support for editing slugs when when Twemoji is being used.

Not sure what would be causing the 197 bug, though.

@pento
5 years ago

#6 @pento
5 years ago

31328.2.diff adds support for post Quick Edit, and support for everything on the Taxonomy screen.

This, however, is where things get interesting: this patch also includes a revert of [28733]. The problem stems from MySQL's collation behaviour - it treats all Unicode Supplementary Characters (which emoji fall under) as being equivalent. It's not until MySQL 5.6, wich the addition of the utf8mb4_unicode_520_ci collation that this changes. So, the behaviour introduced in [28733] means that if you already have multiple tags with emoji names, inserting a tag without a slug defined will cause the slug to be set to the first emoji tag selected, rather than the first emoji tag with the same emoji name.

Simply reverting [28733]'s behaviour obviously isn't the correct solution, as it'll re-break #17689. @wonderboymusic - could I get you to dredge up your memories of #17689, and give us some thought leadership on other options for solving it that could work?

This ticket was mentioned in Slack in #core by pento. View the logs.


5 years ago

@pento
5 years ago

#8 follow-up: @pento
5 years ago

31328.3.diff fixes this by adding $ as an allowed character in sanitize_title_with_dashes(). I also changed a couple of unit tests, because I don't know why the term insertion was expected to fail.

@boone, @SergeyBiryukov, @wonderboymusic: Could you please have a look at this patch and tell me what I'm missing?

@boonebgorges
5 years ago

#9 in reply to: ↑ 8 ; follow-up: @boonebgorges
5 years ago

Replying to pento:

31328.3.diff fixes this by adding $ as an allowed character in sanitize_title_with_dashes(). I also changed a couple of unit tests, because I don't know why the term insertion was expected to fail.

@boone, @SergeyBiryukov, @wonderboymusic: Could you please have a look at this patch and tell me what I'm missing?

Those tests fail by design. The idea is that you should not be able to create terms with the same 'name' in the same taxonomy. That's what #17689 is all about.

That being said, I do think it's correct to roll back [28733]. That fix was perhaps appropriate for the way that wp_insert_term() used to work, but since 4.1, I'm not sure it makes sense anymore. The bug in #17689 was that certain duplicate-name checks were failing because of the way that term names with weird characters sanitize into term slugs. But if all we're worried about is duplicate names, then we should not be checking slugs at all. See 31328.4.diff. This also lets us leave sanitize_title_with_dashes() alone - the addition of $ to the regex there seems arbitrary and is utterly terrifying.

See #31270 and #30379 for related discussion. My suggested change looks like it would fix both of those tickets, though we'd need tests to verify that.

This ticket was mentioned in Slack in #core-editor by pento. View the logs.


5 years ago

This ticket was mentioned in Slack in #core by pento. View the logs.


5 years ago

@pento
5 years ago

#12 @pento
5 years ago

  • Keywords commit added; needs-unit-tests removed
  • Owner set to pento
  • Status changed from new to assigned

I like what @boonebgorges did with 31328.4.diff. Let's go with that.

31328.5.diff adds some extra tests for the utf8_uri_encode() changes.

I don't want to jinx it, but I think this patch is ready to roll. I'll hold off committing it until #31242 is ready.

This ticket was mentioned in Slack in #core by pento. View the logs.


5 years ago

#14 @pento
5 years ago

  • Resolution set to fixed
  • Status changed from assigned to closed

In 31734:

Add emoji URL support, and Twemoji fallback for displaying slugs in wp-admin, when the browser doesn't natively support emoji.

Props pento, SergeyBiryukov and boonebgorges.

Fixes #31328

#15 @iseulde
5 years ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

Dependencies missing + JSHint errors.

@iseulde
5 years ago

@iseulde
5 years ago

#16 @SergeyBiryukov
5 years ago

  • Resolution set to fixed
  • Status changed from reopened to closed

In 31741:

Fix JSHint errors in [31734] and add missing dependencies.

props iseulde.
fixes #31328.

#17 @Chouby
5 years ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

I'd like to reopen this ticket as it introduces a regression.

[31734] does not allow anymore to have the same name for two terms in the same taxonomy at the same hierarchy level. Prior to this, it used to be possible provided that the slug was different.

This regression will at least impact users of multilingual plugins such as WPML, Polylang, etc... as having the same tag name in multiple languages is not something uncommon.

#18 in reply to: ↑ 9 ; follow-up: @SergeyBiryukov
5 years ago

Replying to boonebgorges:

Those tests fail by design. The idea is that you should not be able to create terms with the same 'name' in the same taxonomy. That's what #17689 is all about.

Right, but only if no slug was provided. It should still be possible to create terms with the same name in the same taxonomy with different slugs, as noted by @Chouby. Looks like [31734] broke that.

#19 in reply to: ↑ 18 ; follow-up: @boonebgorges
5 years ago

Replying to SergeyBiryukov:

Replying to boonebgorges:

Those tests fail by design. The idea is that you should not be able to create terms with the same 'name' in the same taxonomy. That's what #17689 is all about.

Right, but only if no slug was provided. It should still be possible to create terms with the same name in the same taxonomy with different slugs, as noted by @Chouby. Looks like [31734] broke that.

Yes, and in fact I thought I'd left a comment on this ticket yesterday to this very effect, but it looks like I forgot to press submit :)

See 31328.6.diff. This change should continue to preserve the spirit of the fix in #17689. I'm not 100% sure that it also preserves the fixes necessary for this ticket - pento, can you chime in here - but the unit tests continue to pass. (You'll see that 6.diff includes new unit tests that replace some older, less specific ones.)

@boonebgorges
5 years ago

#20 @pento
5 years ago

31328.6.diff doesn't re-break this ticket, so no complaints from me.

For reference, the specific problem caused by #17689 was:

  • Start with no emoji terms
  • Create a term as a single emoji character, note that the slug correctly uses the character
  • Create a second term with a different emoji character, the slug incorrectly uses the character from the first term

This is reproducible for any two terms containing the same number of emoji characters (as opposed to glyphs).

This is because we were searching for duplicates by term name. All utf8mb4_* collations (prior to utf8mb4_unicode_520_ci) treat emoji as being equivalent characters, so would just match the first one found. Now that we're searching for duplicates by term term slug, this is no longer a problem. Term slugs store the URL-encoded version of the emoji character, which the utf8mb4_ collations correctly interpret as a string of ASCII text.

#21 in reply to: ↑ 19 @sciamannikoo
5 years ago

I see @boonebgorges proposed patch which seems to go to the right direction.

The problem of unique names, rather than slugs, will cause issues to a fairly good amount of users relying on plugins such as WPML, to handle translatable taxonomies.

Think for instance to the "Hotel" term in a theoretical "Accommodations" category.
The word "Hotel" is used in many languages.
It would make sense and would be acceptable to have a different slug for every language (e.g. hotel, hotel-fr, hotel-it, etc.), but would make very little sense, if people have to think a nice way to translate the same word, when displayed in front-end.

Without a fix like the one proposed by @boonebgorges many people already using duplicated term names, will be negatively affected.

#22 follow-up: @boonebgorges
5 years ago

@sciamannikoo - Thanks for the feedback. To be honest, I'm a bit surprised to hear that this technique is so widely used in multilanguage plugins (appending a suffix to slugs to make language-specific terms within a single taxonomy). In my totally naive opinion, it seems like it'd be much easier to use different taxonomies for different languages. But I defer to the expertise of those who build these plugins.

@pento - Thanks for these nuggets of wisdom. I'll go ahead with the fix.

#23 follow-up: @boonebgorges
5 years ago

In 31792:

In wp_insert_term(), allow a term with an existing name if a unique $slug has been provided.

wp_insert_term() protects against the creation of terms with duplicate names
at the same level of a taxonomy hierarchy. However, it's historically been
possible to override this protection by explicitly providing a value of $slug
that is unique at the hierarchy tier. This ability was broken in [31734], and
the current changeset restores the original behavior.

A number of unit tests are added and refactored in support of these changes.

See #17689 for discussion of a fix that was superceded by [31734]. This commit
retains the fix for the underlying bug described in that ticket.

See #31328.

#24 in reply to: ↑ 23 @sciamannikoo
5 years ago

@boonebgorges - This definitely solve the problem.

Thanks for addressing it so quickly!

#25 @pento
5 years ago

  • Keywords has-patch commit removed
  • Resolution set to fixed
  • Status changed from reopened to closed

What a ticket. What a ride. 5/5, A+++++, would emoji again.

If you come across this ticket during your testing adventures, please open a new ticket for any bugs you encounter.

#26 in reply to: ↑ 22 @Chouby
5 years ago

Replying to boonebgorges:

To be honest, I'm a bit surprised to hear that this technique is so widely used in multilanguage plugins (appending a suffix to slugs to make language-specific terms within a single taxonomy).

Isn't it what WordPress does itself to allow two terms with the same name (but in a different hierarchy)?

Nevertheless, problem solved for me too :)

Note: See TracTickets for help on using tickets.