Make WordPress Core

Opened 4 months ago

Closed 4 months ago

Last modified 4 months ago

#61301 closed defect (bug) (fixed)

HTML API: Tag bookmark length is incorrect

Reported by: jonsurrell's profile jonsurrell Owned by: dmsnell's profile dmsnell
Milestone: 6.6 Priority: normal
Severity: normal Version: 6.5
Component: HTML API Keywords: has-patch has-unit-tests needs-dev-note
Focuses: Cc:

Description

Opening tag tokens in the HTML API are incorrect. I'd expect that substr( $html, $token_start, $token_length ); to produce the entire tag. Currently, this produces the tags up to (but not including) the final >. A tag like <div> has a length 4, so the result would be <div.

This is largely internal, although classes extending the tag processor observe this bug through bookmarks.

Change History (15)

This ticket was mentioned in PR #6625 on WordPress/wordpress-develop by @jonsurrell.


4 months ago
#1

  • Keywords has-patch has-unit-tests added

It seems like the open tags have incorrect length in the tag processor. I'd expect that substr( $html, $token_start, $token_length ); to produce the entire tag. This patch fixes that, before it would exclude the final > on tags.

This is mostly internal and it seems to have adjustments in several places to fix it, but I'd like to have a consistent internal representation. Other types of tokens behave as I'd expect (with substr( $html, $start, $length )).

@dmsnell thoughts?

<details>

<summary>I'd expect something like this to hold:</summary>

<?php

$t = '<div class="target">text</div>trailing text';
$start = 0;
$token_length = 20;
$text_start = $token_length - $start;
$text_length = 4;

$end_start = $text_start + $text_length;
$end_length = 6;

var_dump(
    substr( $t, $start, $token_length ),
    substr( $t, $text_start, $text_length ),
    substr( $t, $end_start, $end_length ),
);

</details>

Trac ticket: https://core.trac.wordpress.org/ticket/61301

@dmsnell commented on PR #6625:


4 months ago
#2

The test failure in #6651 makes me nervous.

https://github.com/WordPress/wordpress-develop/assets/5431237/26fba8d9-7848-45e3-864f-751063c5fe51

@dmsnell commented on PR #6625:


4 months ago
#3

@sirreal can you look into the failure in the linked PR? I think it's related to seeking with updates and getting the wrong $this->bytes_already_processed

@jonsurrell commented on PR #6625:


4 months ago
#4

The test failure in https://github.com/WordPress/wordpress-develop/pull/6651 makes me nervous.
Can you look into the failure in the linked PR? I think it's related to seeking with updates and getting the wrong $this->bytes_already_processed

68a3251d307c39467ab85bf739578d15da378d75 was missing from that branch. There was a test that used the broken offset. The offset adjustment was removed in this branch, I pushed to #6651.

@jonsurrell commented on PR #6625:


4 months ago
#5

@westonruter This is a bugfix for incorrect token lengths in tag bookmarks in the Tag Processor. Subclasses might see and be affected by the bug (and the fix) if they ever work directly with tag bookmarks lengths. I believe you have some plugins that may be doing advanced things with the Tag Processor so wanted to make sure you were aware.

Below is an example fix from this PR, plugins would be impacted in the same way and would need to remove the off-by-one correction on the bookmark lengths:

  • src/wp-includes/interactivity-api/class-wp-interactivity-api-directives-processor.php

    diff --git a/src/wp-includes/interactivity-api/class-wp-interactivity-api-directives-processor.php b/src/wp-includes/interactivity-api/class-wp-interactivity-api-directives-processor.php
    index 3b2dcb123797..b12dcb4b3b15 100644
    a b public function append_content_after_template_tag_closer( string $new_content ): 
    107107
    108108                $bookmark = 'append_content_after_template_tag_closer';
    109109                $this->set_bookmark( $bookmark );
    110                 $after_closing_tag = $this->bookmarks[ $bookmark ]->start + $this->bookmarks[ $bookmark ]->length + 1;
     110                $after_closing_tag = $this->bookmarks[ $bookmark ]->start + $this->bookmarks[ $bookmark ]->length;
    111111                $this->release_bookmark( $bookmark );
    112112
    113113                // Appends the new content.
    private function get_after_opener_tag_and_before_closer_tag_positions( bool $rew 
    140140                }
    141141                list( $opener_tag, $closer_tag ) = $bookmarks;
    142142
    143                 $after_opener_tag  = $this->bookmarks[ $opener_tag ]->start + $this->bookmarks[ $opener_tag ]->length + 1;
     143                $after_opener_tag  = $this->bookmarks[ $opener_tag ]->start + $this->bookmarks[ $opener_tag ]->length;
    144144                $before_closer_tag = $this->bookmarks[ $closer_tag ]->start;
    145145
    146146                if ( $rewind ) {

#6 @gziolo
4 months ago

  • Keywords needs-dev-note added

It will need a small section in the miscellaneous dev note for developers that extend the WP_HTML_Tag_Processor.

#7 @gziolo
4 months ago

  • Milestone changed from Awaiting Review to 6.6

@westonruter commented on PR #6625:


4 months ago
#8

@sirreal See the following method in this subclass of `WP_HTML_Tag_Processor`:

/**
         * Appends HTML to the provided bookmark.
         *
         * @param string $bookmark Bookmark.
         * @param string $html     HTML to inject.
         * @return bool Whether the HTML was appended.
         */
        public function append_html( string $bookmark, string $html ): bool {
                if ( ! $this->has_bookmark( $bookmark ) ) {
                        return false;
                }

                $start = $this->bookmarks[ $bookmark ]->start;

                $this->lexical_updates[] = new WP_HTML_Text_Replacement(
                        $start,
                        $this->old_text_replacement_signature_needed ? $start : 0,
                        $html
                );
                return true;
        }

So since it's not dealing with length then is a patch needed?

@dmsnell commented on PR #6625:


4 months ago
#9

So since it's not dealing with length then is a patch needed?

this should be correct, @westonruter

#10 @jonsurrell
4 months ago

I don't think this needs a full dev note but should definitely have a mention in the release like the following:

A bug was fixed in the WP_HTML_Tag_Processor where tag token bookmarks. This bug affected classes that extend the WP_HTML_Tag_Processor or WP_HTML_Processor classes and access tag bookmark or token lengths directly. Affected classes would have needed to correct the token length by adding 1. The tag length has been fixed and this correction should be removed in WordPress 6.6.

Last edited 4 months ago by jonsurrell (previous) (diff)

#11 @dmsnell
4 months ago

  • Owner set to dmsnell
  • Resolution set to fixed
  • Status changed from new to closed

In 58233:

HTML API: Fix token length bug in Tag Processor.

The Tag Processor stores the byte-offsets into its HTML document where
the current token starts and ends, and also for every bookmark. In some
cases for tags, the end offset has been off by one.

In this patch the offset is fixed so that a bookmark always properly
refers to the full span of the token it's bookmarking. Also the current
token byte offsets are properly recorded.

While this is a defect in the Tag Processor, it hasn't been exposed
through the public interface and has not affected any of the working
of the processor. Only subclasses which rely on the length of a bookmark
have been potentially affected, and these are not supported environments
in the ongoing work.

This fix is important for future work and for ensuring that subclasses
performing custom behaviors remain as reliable as the public interface.

Developed in https://github.com/WordPress/wordpress-develop/pull/6625
Discussed in https://core.trac.wordpress.org/ticket/61301

Props dmsnell, gziolo, jonsurrell, westonruter.
Fixes #61301.

This ticket was mentioned in Slack in #core by juanmaguitar. View the logs.


4 months ago

This ticket was mentioned in Slack in #core by dmsnell. View the logs.


4 months ago

#15 @dmsnell
4 months ago

Dev note incorporated into the Updates to the HTML API in 6.6 post

Note: See TracTickets for help on using tickets.