Make WordPress Core

Changes between Version 1 and Version 2 of Ticket #59883


Ignore:
Timestamp:
11/13/2023 06:28:36 PM (13 months ago)
Author:
westonruter
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • Ticket #59883 – Description

    v1 v2  
    11== Summary
    22
    3 WordPress still officially supports HTML4 and XHTML, but the browsers it serves and the broader web effectively don't. Let's remove support so that we can modernize the code we right and simplify Core's HTML-handling functionality.
     3WordPress still officially supports HTML4 and XHTML, but the browsers it serves and the broader web effectively don't. Let's remove support so that we can modernize the code we write and simplify Core's HTML-handling functionality.
    44
    55== Background
     
    99In various places WordPress maintains the appearance of supporting HTML4, for example:
    1010
    11  - `wp_kses_named_entities()` rejects valid named character references like `⇵` and in turn corrupts documents containing these entities.
    12  - script and style tags conditionally add `type` attributes that never need to be printed
    13  - widgets selectively render `<nav>` and strip tags out of the `$title` for a page when TITLE elements can contain no tags anyway. this leads to corruption in the page title for removing what WordPress thinks are tags but aren't.
    14  - various places run `kses` as if serving XHTML, adding needless invalid syntax like the self-closing flag on void elements, e.g. `<img />`, `<br />`, `<meta />`
     11- `wp_kses_named_entities()` rejects valid named character references like `&DownArrowUpArrow;` and in turn corrupts documents containing these entities.
     12- script and style tags conditionally add `type` attributes that never need to be printed
     13- widgets selectively render `<nav>` and strip tags out of the `$title` for a page when TITLE elements can contain no tags anyway. This leads to corruption in the page title for removing what WordPress thinks are tags but aren't.
     14- various places run `kses` as if serving XHTML, adding needless invalid syntax like the self-closing flag on void elements, e.g. `<img />`, `<br />`, `<meta />`
    1515
    16 The //appearance// of serving HTML4 or XHTML stems from the fact that it's very rare to serve actual XHTML content, and perhaps impossible to serve HTML4 content, to any supported browser or enviornment.
     16The //appearance// of serving HTML4 or XHTML stems from the fact that it's very rare to serve actual XHTML content, and perhaps impossible to serve HTML4 content, to any supported browser or environment.
    1717
    18  - browsers ignore any `<xml>` or `<!DOCTYPE>` declaration specifying HTML4 or XHTML. they interpret a page as HTML5 regardless. you can confirm this by visiting a page with the `&lang;` named character reference. If interpreted as HTML4 it will transform into the U+2329 `〈` code point, but if interpreted as HTML5 will transform into the U+27E8 codepoint `⟨`.
    19  - the only way to serve a page as XHTML is to send the HTTP header `Content-type: application/xhtml+xml` or to serve the page with the `.xml` file extension in the URL (e.g. serve `index.xml` instead of `index.html` or `index.php` or `/index` or `/`). It's not enough to send a `<meta http-equiv="content-type" content="application/xhtml+xml">` tag; it //must// come through the HTTP headers.
     18- browsers ignore any `<xml>` or `<!DOCTYPE>` declaration specifying HTML4 or XHTML. They interpret a page as HTML5 regardless. You can confirm this by visiting a page with the `&lang;` named character reference. If interpreted as HTML4 it will transform into the U+2329 `〈` code point, but if interpreted as HTML5 will transform into the U+27E8 codepoint `⟨`.
     19- the only way to serve a page as XHTML is to send the HTTP header `Content-type: application/xhtml+xml` or to serve the page with the `.xml` file extension in the URL (e.g. serve `index.xml` instead of `index.html` or `index.php` or `/index` or `/`). It's not enough to send a `<meta http-equiv="content-type" content="application/xhtml+xml">` tag; it //must// come through the HTTP headers.
    2020
    2121Because of this behavior in browsers, WordPress sends content that it thinks is one thing but is received as another. Removing official support means that we can start to remove those places that purport to send HTML4 or XHTML content when that assumption is wrong and can lead to data corruption, let alone needless syntax noise.
     
    2828
    2929In future work it opens up opportunities to modernize WordPress:
    30  - we don't need to handle complicated corner cases where pre-HTML5 renders require special cases.
    31  - we can remove code meant for backwards compatability which no longer provides that support.
    32  - we can update Core functions such as `_wp_kses_named_entities()` to prevent them from corrupting data based on inaccurate parsing rules from the past.
    33  - we can define a body of support and scope for what WordPress will and won't attempt to clean up. functions like `force_balance_tags()` and encoding functions attempt to normalize and sanitize HTML but just as often further break that HTML when passing it through to the browser would have a deterministic and safe resolution.
     30- we don't need to handle complicated corner cases where pre-HTML5 renders require special cases.
     31- we can remove code meant for backwards compatibility which no longer provides that support.
     32- we can update Core functions such as `_wp_kses_named_entities()` to prevent them from corrupting data based on inaccurate parsing rules from the past.
     33- we can define a body of support and scope for what WordPress will and won't attempt to clean up. Functions like `force_balance_tags()` and encoding functions attempt to normalize and sanitize HTML but just as often further break that HTML when passing it through to the browser would have a deterministic and safe resolution.
     34- we can eliminate wrapping script output with CDATA escaping which is only needed for XML compatibility.
     35- we can use HTML5 form validation by default in more places instead of requiring an opt-in.
    3436
    3537The HTML API is providing WordPress the ability to have a smarter Core HTML system that won't be confused by rare or unexpected inputs and leans heavily on a spec-compliant "garbage-in garbage-out" approach. This dramatically simplifies HTML processing code without opening unsafe avenues; this is because HTML5 defines how to handle abnormal inputs.