Make WordPress Core

#60392 closed defect (bug) (wontfix)

HTML API: Tag processor no longer visits all closing tags

Reported by: westonruter's profile westonruter Owned by:
Milestone: Priority: normal
Severity: normal Version: 6.5
Component: HTML API Keywords:
Focuses: Cc:

Description (last modified by westonruter)

I just discovered that the HTML Tag Processor is no longer visiting all closing tags.

This issue was introduced in r57348 as part of #60170.

Here is a test case:

<?php

$html = '
        <html>
                <head>
                        <title>Greeting</title>
                </head>
                <body>
                        <p>Hello world!</p>
                </body>
        </html>
';

$p = new WP_HTML_Tag_Processor( $html );

$found_closing_title_tag = false;
while ( $p->next_tag( array( 'tag_closers' => 'visit' ) ) ) {
        if ( $p->is_tag_closer() && 'TITLE' === $p->get_tag() ) {
                $found_closing_head_tag = true;
        }
}

echo ( $found_closing_title_tag ? 'GOOD' : 'BAD' ) . PHP_EOL;

I've noticed this also happen for STYLE and SCRIPT tags in the HEAD, in addition to TITLE.

Before the commit, this outputs GOOD but after the commit it outputs BAD.

Change History (8)

#1 @westonruter
15 months ago

  • Description modified (diff)

#2 follow-up: @dmsnell
15 months ago

@westonruter we made a breaking change in this behavior in order to avoid mistakes dealing with the fact that no HTML exists inside the TITLE.

you can now treat TITLE and the other special tags as if they were void or self-closing. to get the contents, call $processor->get_modifiable_text() while matched on a TITLE tag.

you will also still visit closing TITLE tags if they appear unexpectedly, as in <div>This is not a div closer</title>

#3 @costdev
15 months ago

@dmsnell Two questions:

  1. Can you drop a link to where the breaking change was discussed and had consensus as an acceptable breaking change?
  2. Has there been discussion about a dev note for #60170 to inform extenders about the changes?

#4 @dmsnell
15 months ago

thanks @costdev.

#60170 has been open for four weeks and the underlying work has been open since November, and discussion has occurred in #core-html-api in Slack since at least then.

https://github.com/WordPress/wordpress-develop/pull/5683

The change is partially breaking, because it wasn't possible beforehand to read text content from the Tag Processor. The same behavior was embedded within the Tag Processor when an HTML document only contained the opening TITLE tag; it would not find the opening tag if it couldn't find the closing tag.

The dev notes highlight this as a change. I'll send you a DM with a link to the preview.

#5 @costdev
15 months ago

Thanks @dmsnell!

#6 in reply to: ↑ 2 @westonruter
15 months ago

  • Keywords close added; needs-patch removed

Replying to dmsnell:

@westonruter we made a breaking change in this behavior in order to avoid mistakes dealing with the fact that no HTML exists inside the TITLE.

you can now treat TITLE and the other special tags as if they were void or self-closing. to get the contents, call $processor->get_modifiable_text() while matched on a TITLE tag.

you will also still visit closing TITLE tags if they appear unexpectedly, as in <div>This is not a div closer</title>

OK, thanks. Please let me know if this change aligns with the changes in WP 5.6: https://github.com/WordPress/performance/pull/963

#7 @dmsnell
15 months ago

@westonruter it looks like you made appropriate changes there. Note that you may still encounter closing tags of these types, but only if they are unexpected. In your script though this should be fine as I don't think you'll have the elements on the stack to pop off, and I think you skip popping from the stack if the bottom element isn't the same as what you scan with the HTML API.

Soon I hope to be at the point where you can fully switch to the HTML Processor where none of this will have to be replicated in your own application-level code.

#8 @westonruter
15 months ago

  • Keywords close removed
  • Milestone 6.5 deleted
  • Resolution set to wontfix
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.