Make WordPress Core

Opened 14 years ago

Closed 12 years ago

#14225 closed defect (bug) (wontfix)

Use NCRs instead of HTML entities in Twenty Ten

Reported by: peaceablewhale's profile peaceablewhale Owned by:
Milestone: Priority: normal
Severity: normal Version: 3.0
Component: Bundled Theme Keywords: has-patch
Focuses: Cc:

Description

The Twenty Ten theme is currently using HTML entities to represent some Unicode characters. This will break the page when the page is served as "application/xhtml+xml". To avoid that and make Twenty Ten compatible with both HTML and XHTML syntax of HTML5, numeric character references should be used instead.

Attachments (1)

14225.patch (11.7 KB) - added by peaceablewhale 14 years ago.

Download all attachments as: .zip

Change History (16)

#1 @westi
14 years ago

Why does this break the page when serving as XHTML?

AFAIK XHTML supports the same named entities as HTML4 which would inclide &raquo'.

http://www.w3.org/TR/html401/sgml/entities.html

Changing this makes it less clear what it being done.

#2 @nacin
14 years ago

application/xhtml+xml != XHTML. See also #14224, #14226.

#3 @westi
14 years ago

In general we don't code for serving as application/xhtml+xml but rather as text/html

As far as entities are concerned the named entities are fine in XML as long as the parser used can cope with them:

http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references#Entities_representing_special_characters_in_XHTML

#4 @peaceablewhale
14 years ago

Please see #14224 for comments.

#5 @nacin
14 years ago

  • Milestone changed from Awaiting Review to Future Release

#6 @nacin
14 years ago

  • Keywords 2nd-opinion added

#7 @GamajoTech
14 years ago

To reword what I think the OP meant, as the (expected) HTML5 doctype only supports five named entities, then all (other?) named entities should be converted to numerical entities, for any one who wants to create a an off-shoot of TwentyTen that uses the HTML5 doctype.

However, the W3C say that Content SHOULD use the hexadecimal form of character escapes rather than the decimal form when there are both, so rather than converting ← to the decimal ← as per the patch, it SHOULD be converted to the hexadecimal ←, with other entities converted accordingly.

#8 @GaryJ
14 years ago

To correct myself:
In HTML5, parsed as text/html, all named entities are predefined and valid.

However, like XHTML 1.0 Strict MAY be (and usually is) parsed as text/html and not application/xhtml+xml, it's possible to write HTML5 in a polyglot form, such that should it be parsed with an XML parser (as application/xhtml+xml) it would be valid XHTML5.

There is no formal DTD for XHTML5, and although you could provide reference to an external DTD for adding named entities, browsers do not universally make use of them for their parsers, meaning it's basically not an option, as, say, ·, … or » may not be recognised.

The recommendation by the WHATWG for producing HTML5 documents capable of being parsed as XML for XHTML5, is to use numerical entities, except for the 5 implicit named entities that are safe.

Changing all named entities to their hexadecimal equivalents across all of core, not just Twenty Ten, has no negative impacts (save .po strings changing), as browsers back to at least IE5.5 (and maybe earlier) cope with hexadecimal characters fine. In the meantime, it's a future-proofing fix that will greatly aid those wanting to output their sites as application/xhtml+xml, without having to raise individual issues such as #16049!

#9 @peaceablewhale
14 years ago

In short, the use of HTML entities is not recommended by the HTML Working Group.

#10 @peaceablewhale
14 years ago

Patch updated to make it work with Twenty Ten 1.2. Still using decimal form as WordPress has been using that form. I am willing to change to use hexadecimal form if a consensus can be reached (I personally like hexadecimal form more).

#11 @holizz
14 years ago

  • Cc tom@… added

#12 @SergeyBiryukov
12 years ago

  • Component changed from Themes to Bundled Theme

#14 @SergeyBiryukov
12 years ago

  • Milestone changed from Future Release to WordPress.org

#15 @lancewillett
12 years ago

  • Keywords 2nd-opinion removed
  • Milestone WordPress.org deleted
  • Resolution set to wontfix
  • Status changed from new to closed

Closing in the same vein as #9030.

Note: See TracTickets for help on using tickets.