Make WordPress Core

Changes between Initial Version and Version 10 of Ticket #61009


Ignore:
Timestamp:
05/22/2024 10:17:04 PM (10 months ago)
Author:
dmsnell
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • Ticket #61009

    • Property Summary changed from HTML API: Preserve some additional invalid HTML comment syntaxes. to HTML API: Fix some existing bugs in `kses` comment detection, enable Bits storage.
  • Ticket #61009 – Description

    initial v10  
    1 When `wp_kses_split` processes a document it attempts to leave HTML comments relatively alone. It makes minor adjustments, but leaves the comments in the document in its output.
     1When `wp_kses_split` processes a document it attempts to leave HTML comments alone. It makes minor adjustments, but leaves the comments in the document in its output. Unfortunately it only recognizes one kind of HTML comment and rejects many others.
    22
    3 Unfortunately it only recognizes one kind of HTML comment and rejects many other kinds which appear as the result of various invalid HTML markup.
     3In HTML there are many kinds of invalid markup which, according to the specification, are to be interpreted as an HTML comment. These include, but are not limited to:
     4
     5 - HTML comments with invalid syntax, `<!-->`, `<!-- --!>`, etc…
     6 - HTML closing tags whose tag name is invalid `</3>`, `</%happy>`, etc…
     7 - Things that look like XML CDATA sections, `<![CDATA[…]]>`
     8 - Things that look like XML Processor Instruction nodes, `<?include "blarg">`
    49
    510This patch makes a minor adjustment to the algorithm in `wp_kses_split` to allow two additional kinds of HTML comments:
    611
    7  - HTML comments with the incorrect closer `--!>`.
    8  - Closing tags with an invalid tag name, e.g. `</%dolly>`.
     12 - HTML comments with the incorrect closer `--!>`, because this one was a simple and easy change.
     13 - Closing tags with an invalid tag name, e.g. `</%dolly>`j, because these are required to open up explorations in Gutenberg on Bits, a new iteration of dynamic tokens for externally-sourced data, or "Shortcodes 2.0"
    914
    10 In an HTML parser these all become comments, and so leaving them in the document should be a benign operation, improving the reliability of detecting comments in Core. These invalid closing tags, which in a browser are interpreted as comments, are one proposal for a placeholder mechanism in the HTML API unlocking HTML templating, a new kind of shortcode, and more. Having these persist in Core is a requirement for exploring and utilizing the new syntax.
     15These invalid closing tags, which in a browser are interpreted as comments, are one proposal for a placeholder mechanism in the HTML API unlocking HTML templating, a new kind of shortcode, and more. Having these persist in Core is a requirement for exploring and utilizing the new syntax because as long as Core removes them, there's no way to load content from the database and experiment on the full life cycle of potential Bits systems.
     16
     17On its own, however, this represents a kind of bug fix for Core, making the implementation of `wp_kses_split()` more closely align with its stated goal of leaving HTML comments as comments. It doesn't attempt to fully fix the mis-parsed comments (because that is a much deeper issue and involves many more questions about existing expectations) but it does propose a couple of hopefully and expectedly minor fixes that hopefully won't break any existing code or projects.