WordPress.org

Make WordPress Core

Opened 3 years ago

Last modified 17 months ago

#30796 new enhancement

Entity Name vs. Entity Number

Reported by: johnjamesjacoby Owned by:
Milestone: Awaiting Review Priority: normal
Severity: normal Version:
Component: Text Changes Keywords: needs-patch dev-feedback
Focuses: Cc:

Description

In our strings, we currently use ’ and ’ interchangeably. They result in the same right quote mark () so this isn't a bug, nor is it grammatically incorrect. It may however be a potential bottleneck for individuals translating our heavy slang and contraction use, which appears to be in the hundreds of strings.

I'd like to suggest we do one of two things:

  • Switch completely to ’ as it's easier to grok than ’
  • Switch completely to ’ unless we have a complementary ‘ usage. (Note that we only currently have 1 ‘ and it's incorrectly used in a contraction.)
  • Remove our contraction usages completely. This results in a subtle tone change and removes some of WordPress's Texan personality, but also makes internationalization easier and potentially more inviting as a result.

There are likely other entities worth discussing, so I titled this ticket intentionally broad and highlighted one of the more obvious usages. Definitely feel free to retitle and modify this ticket for maximum traction, y’all.

Change History (4)

#1 @chriscct7
2 years ago

  • Keywords i18n-change removed

#3 follow-up: @GaryJ
17 months ago

  • Keywords dev-feedback added

Polyglot markup requires numbered entities and the W3C Recommendation is that numbered entities SHOULD use the hexadecimal form when it exists.

Is WP UTF-8 safe yet? Could we use the real in the code?

Even with numbered or named entities, do translators ensure those entities are used in contractions in the translated strings? Does GlotPress do any sort of conversion? How about plugin authors? I feel some more answers are needed beyond just whether to change the source, to see what effect (good or bad) it will have for translators.

#4 in reply to: ↑ 3 @SergeyBiryukov
17 months ago

Replying to GaryJ:

Is WP UTF-8 safe yet? Could we use the real in the code?

There's a precedent in [38359].

Even with numbered or named entities, do translators ensure those entities are used in contractions in the translated strings?

I think entities were originally used to make sure the code is ASCII-only, and prevent code editors from introducing issues like the one just fixed in [38517] :) That said, I guess most editors should be UTF-8 safe now.

Translators can use actual characters instead of entities, that should not cause any issues.

Does GlotPress do any sort of conversion?

Not that I know of.

Note: See TracTickets for help on using tickets.