Make WordPress Core

Opened 5 years ago

Closed 5 years ago

#48044 closed defect (bug) (fixed)

Site Health use of "emoticons" in UTFMB4 check is ambiguous

Reported by: johnjamesjacoby's profile johnjamesjacoby Owned by: garrett-eclipse's profile garrett-eclipse
Milestone: 5.3 Priority: normal
Severity: minor Version:
Component: Site Health Keywords: has-patch needs-copy-review
Focuses: ui-copy Cc:


The UTFMB4 test says:

UTF8MB4 is a database storage attribute that makes sure your site can store non-English text and other strings (for instance emoticons) without unexpected problems.

Emoticons, by definition, are (emphasis mine):

a pictorial representation of a facial expression using characters—usually punctuation marks, numbers, and letters—to express a person's feelings or mood

Emoticons might be referring to the more technical alternative meaning of unicode block and if so, it seems oddly technical to include it here while also not being completely accurate as written.

Attachments (3)

48044.diff (780 bytes) - added by chetan200891 5 years ago.
Created initial patch.
48044.2.diff (806 bytes) - added by garrett-eclipse 5 years ago.
Refresh to improve the copy and account for provided feedback
48044.3.diff (803 bytes) - added by garrett-eclipse 5 years ago.
Refresh for better accuracy

Download all attachments as: .zip

Change History (20)

#1 follow-up: @johnjamesjacoby
5 years ago

One quick suggestion:

UTF8MB4 is the database format WordPress prefers because it safely supports the widest set of characters and letters, specifically for publishing in languages other than American English, including Emoji.

#2 in reply to: ↑ 1 @SergeyBiryukov
5 years ago

  • Keywords good-first-bug added
  • Milestone changed from Awaiting Review to 5.3

Replying to johnjamesjacoby:

UTF8MB4 is the database format WordPress prefers because it safely supports the widest set of characters and letters, specifically for publishing in languages other than American English, including Emoji.

I'd suggest "database encoding" instead of "database format", looks good to me otherwise.

#3 @ayeshrajans
5 years ago

Database encoding is more accurate and even less confusing I also agree.

Why do we use "languages" in there, though? Emojis, numbers symbols, flags, etc are not from a particular language.

#4 @SergeyBiryukov
5 years ago

I think "Emoji" refers to "the widest set of characters and letters" here, rather than "languages". Perhaps that could be made clearer, though? I would also drop "American", it seems unnecessary specific :)

UTF8MB4 is the database encoding WordPress prefers because it safely supports the widest set of characters and letters, including Emoji, specifically for publishing in languages other than English.

This ticket was mentioned in Slack in #core-site-health by afragen. View the logs.

5 years ago

This ticket was mentioned in Slack in #core by desrosj. View the logs.

5 years ago

#7 @desrosj
5 years ago

  • Keywords needs-patch added

5 years ago

Created initial patch.

#8 @chetan200891
5 years ago

  • Keywords has-patch added; needs-patch removed

#9 @chetan200891
5 years ago

Created patch while showing demo of contributing to core in a local meetup.

This ticket was mentioned in Slack in #core-site-health by afragen. View the logs.

5 years ago

#11 @afragen
5 years ago

  • Keywords needs-copy-review added

This ticket was mentioned in Slack in #core by david.baumwald. View the logs.

5 years ago

5 years ago

Refresh to improve the copy and account for provided feedback

#13 @garrett-eclipse
5 years ago

  • Keywords 2nd-opinion good-first-bug removed
  • Owner set to garrett-eclipse
  • Status changed from new to accepted

Thanks for the patch @chetan200891 I've taken into account @johnjamesjacoby & @SergeyBiryukov & @ayeshrajans for refresh 48044.2.diff.

I've left in review to get thoughts on the current string below;
'UTF8MB4 is the character encoding WordPress prefers for database storage because it safely supports the widest set of characters and letters, including Emoji, enabling better support for non-English languages.'

Note: I did switch to use 'character encoding' over 'database encoding' as it is more accurate.

Thoughts? If we can get consensus this may be able to make 5.3 beta3

#14 @ayeshrajans
5 years ago

IMO, the text in #13 sounds perfect.

5 years ago

Refresh for better accuracy

#15 @garrett-eclipse
5 years ago

Sorry, I realized some minor inaccuracies as I read the definitions it's a character set used for database storage and a character set is a set of characters and encodings.

To account for this I refreshed in 48044.3.diff

New string for review;

'UTF8MB4 is the character set WordPress prefers for database storage because it safely supports the widest set of characters and encodings, including Emoji, enabling better support for non-English languages.'

#16 @garrett-eclipse
5 years ago

Thanks @ayeshrajans I tweaked it slightly in comment#15 after reading some more definitions online.

#17 @SergeyBiryukov
5 years ago

  • Resolution set to fixed
  • Status changed from accepted to closed

In 46402:

Site Health: Improve the wording for UTF8MB4 test description.

Props garrett-eclipse, chetan200891, johnjamesjacoby, ayeshrajans.
Fixes #48044.

Note: See TracTickets for help on using tickets.