WordPress.org

Make WordPress Core

Opened 2 months ago

Closed 6 weeks ago

#48044 closed defect (bug) (fixed)

Site Health use of "emoticons" in UTFMB4 check is ambiguous

Reported by: johnjamesjacoby Owned by: garrett-eclipse
Milestone: 5.3 Priority: normal
Severity: minor Version:
Component: Site Health Keywords: has-patch needs-copy-review
Focuses: ui-copy Cc:
PR Number:

Description

The UTFMB4 test says:

UTF8MB4 is a database storage attribute that makes sure your site can store non-English text and other strings (for instance emoticons) without unexpected problems.

Emoticons, by definition, are (emphasis mine):

a pictorial representation of a facial expression using characters—usually punctuation marks, numbers, and letters—to express a person's feelings or mood

Emoticons might be referring to the more technical alternative meaning of unicode block and if so, it seems oddly technical to include it here while also not being completely accurate as written.

Attachments (3)

48044.diff (780 bytes) - added by chetan200891 7 weeks ago.
Created initial patch.
48044.2.diff (806 bytes) - added by garrett-eclipse 6 weeks ago.
Refresh to improve the copy and account for provided feedback
48044.3.diff (803 bytes) - added by garrett-eclipse 6 weeks ago.
Refresh for better accuracy

Download all attachments as: .zip

Change History (20)

#1 follow-up: @johnjamesjacoby
2 months ago

One quick suggestion:

UTF8MB4 is the database format WordPress prefers because it safely supports the widest set of characters and letters, specifically for publishing in languages other than American English, including Emoji.

#2 in reply to: ↑ 1 @SergeyBiryukov
2 months ago

  • Keywords good-first-bug added
  • Milestone changed from Awaiting Review to 5.3

Replying to johnjamesjacoby:

UTF8MB4 is the database format WordPress prefers because it safely supports the widest set of characters and letters, specifically for publishing in languages other than American English, including Emoji.

I'd suggest "database encoding" instead of "database format", looks good to me otherwise.

#3 @ayeshrajans
2 months ago

Database encoding is more accurate and even less confusing I also agree.

Why do we use "languages" in there, though? Emojis, numbers symbols, flags, etc are not from a particular language.

#4 @SergeyBiryukov
2 months ago

I think "Emoji" refers to "the widest set of characters and letters" here, rather than "languages". Perhaps that could be made clearer, though? I would also drop "American", it seems unnecessary specific :)

UTF8MB4 is the database encoding WordPress prefers because it safely supports the widest set of characters and letters, including Emoji, specifically for publishing in languages other than English.

This ticket was mentioned in Slack in #core-site-health by afragen. View the logs.


8 weeks ago

This ticket was mentioned in Slack in #core by desrosj. View the logs.


8 weeks ago

#7 @desrosj
8 weeks ago

  • Keywords needs-patch added

@chetan200891
7 weeks ago

Created initial patch.

#8 @chetan200891
7 weeks ago

  • Keywords has-patch added; needs-patch removed

#9 @chetan200891
7 weeks ago

Created patch while showing demo of contributing to core in a local meetup.

This ticket was mentioned in Slack in #core-site-health by afragen. View the logs.


7 weeks ago

#11 @afragen
7 weeks ago

  • Keywords needs-copy-review added

This ticket was mentioned in Slack in #core by david.baumwald. View the logs.


7 weeks ago

@garrett-eclipse
6 weeks ago

Refresh to improve the copy and account for provided feedback

#13 @garrett-eclipse
6 weeks ago

  • Keywords 2nd-opinion good-first-bug removed
  • Owner set to garrett-eclipse
  • Status changed from new to accepted

Thanks for the patch @chetan200891 I've taken into account @johnjamesjacoby & @SergeyBiryukov & @ayeshrajans for refresh 48044.2.diff.

I've left in review to get thoughts on the current string below;
'UTF8MB4 is the character encoding WordPress prefers for database storage because it safely supports the widest set of characters and letters, including Emoji, enabling better support for non-English languages.'

Note: I did switch to use 'character encoding' over 'database encoding' as it is more accurate.

Thoughts? If we can get consensus this may be able to make 5.3 beta3

#14 @ayeshrajans
6 weeks ago

IMO, the text in #13 sounds perfect.

@garrett-eclipse
6 weeks ago

Refresh for better accuracy

#15 @garrett-eclipse
6 weeks ago

Sorry, I realized some minor inaccuracies as I read the definitions it's a character set used for database storage and a character set is a set of characters and encodings.

To account for this I refreshed in 48044.3.diff

New string for review;

'UTF8MB4 is the character set WordPress prefers for database storage because it safely supports the widest set of characters and encodings, including Emoji, enabling better support for non-English languages.'

#16 @garrett-eclipse
6 weeks ago

Thanks @ayeshrajans I tweaked it slightly in comment#15 after reading some more definitions online.

#17 @SergeyBiryukov
6 weeks ago

  • Resolution set to fixed
  • Status changed from accepted to closed

In 46402:

Site Health: Improve the wording for UTF8MB4 test description.

Props garrett-eclipse, chetan200891, johnjamesjacoby, ayeshrajans.
Fixes #48044.

Note: See TracTickets for help on using tickets.