#33511 closed defect (bug) (fixed)
Bad value for attribute lang on element html
Reported by: | Chouby | Owned by: | ocean90 |
---|---|---|---|
Milestone: | 4.5 | Priority: | normal |
Severity: | normal | Version: | 4.3 |
Component: | I18N | Keywords: | has-patch commit |
Focuses: | accessibility | Cc: |
Description
An RFC 5646 language tag consists of hyphen-separated ASCII-alphanumeric subtags. There is a primary tag identifying a natural language by its shortest ISO 639 language code (e.g. en for English) and zero or more additional subtags adding precision. The most common additional subtag type is a region subtag which most commonly is a two-letter ISO 3166 country code (e.g. GB for the United Kingdom). IANA maintains a registry of permissible subtags
WP 4.3 introduced two new languages packs with locales which do not validate at https://validator.w3.org/:
'de-DE-formal' and 'oci'.
I have no idea for de-DE-formal, but we should use the ISO 639-1 language code 'oc' instead of 'oci'.
Then I reviewed all locales at https://translate.wordpress.org/ using a ISO 639-2 code. Several of them do not validate because an ISO 639-1 code is avalaible:
bel -> be
bre -> br
dzo -> dz
ido -> io
kin -> rw
mri -> mi
roh -> rm
srd -> sc
tuk -> tk
Finally 'bal' is totally misused as the code is for the Balochi language and not for Catalan (Balear).
Attachments (3)
Change History (25)
#2
@
9 years ago
This is... fun. :)
In general, I think we should continue to use the ISO 639-3 codes for our subdomains and internal tracking, but perhaps output lang attributes with ISO 639-2 codes if such a code exists for that locale. In the case of de-de-formal, perhaps that locale should output de-de as the lang attribute and not de-de-formal, but I am sure experts will disagree. :)
AFAICT, Catalan (Balear) has never been used and could probably be removed / recreated with the correct locale code.
(Note that the polyglots team handles locale codes and it's probably best to post over there with a link to this ticket.)
#3
@
9 years ago
It should be ok to just remove the Catalan (Balear) locale. It's not active and doesn't seem like it would be. The last activity is from the week of WordCamp Barcelona, but it doesn't seem those contributors will be back to translating it.
#4
in reply to:
↑ 1
@
9 years ago
IANA have a subtag registration process (section 3.5 of RFC 5646). Perhaps WordPress could submit a request to make formal
a recognised "variant" type. Otherwise I am in favour of the private use extension for non-standard tags.
#5
@
9 years ago
Well, de-DE-formal is still exactly the same language as de-DE, so I think it should output de-DE. Unless you can think of any benefits you get from something like de-DE-x-formal.
#6
@
9 years ago
The same occurs for Dutch lang="nl-NL" and lang="nl-NL-formal".
nl-NL-formal gives a validation error: "Bad value nl-NL-formal for attribute lang on element html: Bad variant subtag formal.”
These translations are significant different in the way users are addressed, but both very Dutch.
So lang="nl-NL" would validate for both of them.
#8
follow-up:
↓ 10
@
9 years ago
- Milestone changed from Awaiting Review to 4.5
Seems like we should strip -formal
in get_bloginfo()
, see 33511.patch.
#10
in reply to:
↑ 8
@
9 years ago
- Focuses accessibility removed
- Milestone changed from 4.5 to Awaiting Review
Replying to SergeyBiryukov:
Seems like we should strip
-formal
inget_bloginfo()
, see 33511.patch.
Then we should strip -informal
too, even if there is currently no such language.
But still, this doesn't solve the real issue: Using the wp_locale for the lang attribute is just wrong.
#11
@
9 years ago
- Focuses accessibility added
Since the lang attribute affects the way screen readers read out web pages, I'd recommend to keep the accessibility focus on this ticket. It doesn't harm, and helps the accessibility team to track this issue :) See: http://adrianroselli.com/2015/01/on-use-of-lang-attribute.html
This ticket was mentioned in Slack in #accessibility by rianrietveld. View the logs.
9 years ago
#13
@
9 years ago
- Keywords has-patch added
- Milestone changed from Awaiting Review to 4.5
Following the Slack discussion, seems like 33511.2.patch should work.
This ticket was mentioned in Slack in #core-i18n by ocean90. View the logs.
9 years ago
#15
@
9 years ago
- Keywords commit added
33511.3.patch includes a stricter check in case the string is (incorrectly) translated literally.
#16
@
9 years ago
As I have mentioned when this got implemented, the issue in my opinion is how it is implemented. It was also breaking BC in the admin where the body class is also get_locale which probably also should use the language value. I still believe in storing the locale and the writing style/dialect etc. separate. So the information can be better used then stripping values.
#17
@
9 years ago
- Owner set to ocean90
- Resolution set to fixed
- Status changed from new to closed
In 36802:
#18
@
9 years ago
For posterity: here's what happens using a screen reader when the language attribute is wrong:
https://www.youtube.com/watch?v=0uzxu9dQnuU
Video reported by @rianrietveld on Slack, courtesy of Mr. Steve Faulkner.
#19
follow-ups:
↓ 20
↓ 21
@
9 years ago
@SergeyBiryukov, @ocean90 In theory, your solution should work. I wonder how it will be handled by translators in practice.
Beside this new html_lang_attribute, we already have ltr, number_format_decimal_point and number_format_thousands_sep which must not be translated in the usual way. I checked a few locales and this seems to be misunderstood by some translators (comments do no seem to be sufficient). Ex: bel, dzo
@afercia it's even worse than what I would have expected ;-)
#20
in reply to:
↑ 19
;
follow-up:
↓ 22
@
9 years ago
Replying to Chouby:
I checked a few locales and this seems to be misunderstood by some translators (comments do no seem to be sufficient). Ex: bel, dzo
I've checked the ltr
, number_format_decimal_point
, and number_format_thousands_sep
strings in those locales. They're not translated in dzo, and only the last two are translated in bel, which doesn't surprise me as both locales are only ~78% complete. Is there any other issue I've missed?
#21
in reply to:
↑ 19
@
9 years ago
Replying to Chouby:
I rejected the wrong translations. I also have a script which I usually run a few days before a release which catches those cases.
#22
in reply to:
↑ 20
@
9 years ago
Replying to SergeyBiryukov:
They're not translated in dzo
I guess that @ocean90 acted meanwhile. Thanks for fixing the link.
Do you mean that some automatic check could be planned for this case too? It may difficult to catch cases such as oci where the string was already translated to 'oci' instead of 'oc'.
I went on with my investigations. If I well understood the BCP47, we could use private use subtags. Private use subtags are introduced by a 'x'.
As such, we could keep a different locale for formal German represented by:
de-DE-x-formal
If we need to keep a separate locale for Catalan (Balear), we could use something as:
ca-x-ES-IB
(ES-IB beeing the ISO 3166-2 code for Balearic Islands.Both new proposed codes validate at https://validator.w3.org/