Make WordPress Core

Opened 22 months ago

Closed 16 months ago

Last modified 16 months ago

#58119 closed defect (bug) (fixed)

HTML API: Remove all duplicate copies of an attribute when removing

Reported by: dmsnell's profile dmsnell Owned by: bernhard-reiter's profile Bernhard Reiter
Milestone: 6.3.2 Priority: normal
Severity: normal Version: 6.2
Component: HTML API Keywords: has-patch has-unit-tests fixed-major
Focuses: Cc:

Description

wordpress/wordpress-develop#4317

When encountering an HTML tag with duplicate copies of an attribute the tag processor ignores the duplicate values, according to the specification. However, when removing an attribute it must remove all copies of that attribute lest one of the duplicates becomes the primary and it appears as if no attributes were removed.

In this patch we're adding tests that will be used to ensure that all attribute copies are removed from a tag when one is request to be removed.

Before

<?php
$p = new WP_HTML_Tag_Processor( '<br id=one id="two" id='three' id>' );
$p->next_tag();
$p->remove_attribute( 'id' );
$p->get_updated_html();
// <br id="two" id='three' id>

After

<?php
$p = new WP_HTML_Tag_Processor( '<br id=one id="two" id='three' id>' );
$p->next_tag();
$p->remove_attribute( 'id' );
$p->get_updated_html();
// <br>

Previously we have been overlooking duplicate attributes since they don't have an impact on what parses into the DOM. However, as one unit test affirmed (asserting the presence of the bug in the tag processor) when removing an attribute where duplicates exist this meant we ended up changing the value of an attribute instead of removing it.

In this patch we're tracking the text spans of the parsed duplicate attributes so that _if_ we attempt to remove them then we'll have the appropriate information necessary to do so. When an attribute isn't removed we'll simply forget about the tracked duplicates. This involves some overhead for normal operation _when_ in fact there are duplicate attributes on a tag, but that overhead is minimal in the form of integer pairs of indices for each duplicated attribute.

Change History (11)

This ticket was mentioned in PR #4317 on WordPress/wordpress-develop by @dmsnell.


22 months ago
#1

## Description

Trac ticket #58119

When encountering an HTML tag with duplicate copies of an attribute the tag processor ignores the duplicate values, according to the specification. However, when removing an attribute it must remove all copies of that attribute lest one of the duplicates becomes the primary and it appears as if no attributes were removed.

In this patch we're adding tests that will be used to ensure that all attribute copies are removed from a tag when one is request to be removed.

Before

{{{php
$p = new WP_HTML_Tag_Processor( '<br id=one id="two" id='three' id>' );
$p->next_tag();
$p->remove_attribute( 'id' );
$p->get_updated_html();
<br id="two" id='three' id>
}}}

After

{{{php
$p = new WP_HTML_Tag_Processor( '<br id=one id="two" id='three' id>' );
$p->next_tag();
$p->remove_attribute( 'id' );
$p->get_updated_html();
<br>
}}}

Previously we have been overlooking duplicate attributes since they don't have an impact on what parses into the DOM. However, as one unit test affirmed (asserting the presence of the bug in the tag processor) when removing an attribute where duplicates exist this meant we ended up changing the value of an attribute instead of removing it.

In this patch we're tracking the text spans of the parsed duplicate attributes so that _if_ we attempt to remove them then we'll have the appropriate information necessary to do so. When an attribute isn't removed we'll simply forget about the tracked duplicates. This involves some overhead for normal operation _when_ in fact there are duplicate attributes on a tag, but that overhead is minimal in the form of integer pairs of indices for each duplicated attribute.

cc: @adamziel @ockham

@zieladam commented on PR #4317:


22 months ago
#2

The logic LGTM @dmsnell, thank you! I left a few nitpicks about formatting, but that's it.

#3 follow-up: @dmsnell
17 months ago

@bernhard-reiter @gziolo it looks like we missed this one, my bad.

I now have a question: this was a bug-fix initially for 6.2 but now I'm not sure where we should apply it or how to know what versions we are still supporting in Core. do we backport this to 6.2 and 6.3?

whatever goes, I'll be away from the computer for the next few days. feel free to modify the @since 6.3.1 annotations since I guessed at what they should be. originally they were @since 6.2.1. feel free to merge it if you are happy with it.

#4 in reply to: ↑ 3 @Bernhard Reiter
16 months ago

Replying to dmsnell:

@bernhard-reiter @gziolo it looks like we missed this one, my bad.

I now have a question: this was a bug-fix initially for 6.2 but now I'm not sure where we should apply it or how to know what versions we are still supporting in Core. do we backport this to 6.2 and 6.3?

I'm leaning towards 6.3.2 for now, as I know that's going to be happen in the near future (~next two weeks), whereas I'm not sure if/when a 6.2.3 is planned.

I guess if a 6.2.3 happens, we can still update the @since in later branches?

#5 @dmsnell
16 months ago

  * @since 6.3.2 Removes duplicate attributes
+ * @since 6.3.3 Sinced it to 6.2.3
  *

😄

thank you @bernhard-reiter. I'll update the @since version in the patch to 6.3.2

#6 @Bernhard Reiter
16 months ago

  • Owner set to Bernhard Reiter
  • Resolution set to fixed
  • Status changed from new to closed

In 56684:

HTML API: Remove all duplicate copies of an attribute when removing.

When encountering an HTML tag with duplicate copies of an attribute the tag processor ignores the duplicate values, according to the specification. However, when removing an attribute it must remove all copies of that attribute lest one of the duplicates becomes the primary and it appears as if no attributes were removed.

In this patch we're adding tests that will be used to ensure that all attribute copies are removed from a tag when one is request to be removed.

Before

<?php
$p = new WP_HTML_Tag_Processor( '<br id=one id="two" id='three' id>' );
$p->next_tag();
$p->remove_attribute( 'id' );
$p->get_updated_html();
// <br id="two" id='three' id>

After

<?php
$p = new WP_HTML_Tag_Processor( '<br id=one id="two" id='three' id>' );
$p->next_tag();
$p->remove_attribute( 'id' );
$p->get_updated_html();
// <br>

Previously we have been overlooking duplicate attributes since they don't have an impact on what parses into the DOM. However, as one unit test affirmed (asserting the presence of the bug in the tag processor) when removing an attribute where duplicates exist this meant we ended up changing the value of an attribute instead of removing it.

In this patch we're tracking the text spans of the parsed duplicate attributes so that if we attempt to remove them then we'll have the appropriate information necessary to do so. When an attribute isn't removed we'll simply forget about the tracked duplicates. This involves some overhead for normal operation when in fact there are duplicate attributes on a tag, but that overhead is minimal in the form of integer pairs of indices for each duplicated attribute.

Props dmsnell, zieladam.
Fixes #58119.

#7 @Bernhard Reiter
16 months ago

  • Milestone changed from Awaiting Review to 6.3.2

#9 @Bernhard Reiter
16 months ago

  • Keywords fixed-major added
  • Resolution fixed deleted
  • Status changed from closed to reopened

Reopening to backport to the 6.3 branch.

#10 @Bernhard Reiter
16 months ago

  • Resolution set to fixed
  • Status changed from reopened to closed

In 56685:

HTML API: Remove all duplicate copies of an attribute when removing.

When encountering an HTML tag with duplicate copies of an attribute the tag processor ignores the duplicate values, according to the specification. However, when removing an attribute it must remove all copies of that attribute lest one of the duplicates becomes the primary and it appears as if no attributes were removed.

In this patch we're adding tests that will be used to ensure that all attribute copies are removed from a tag when one is request to be removed.

Before

<?php
$p = new WP_HTML_Tag_Processor( '<br id=one id="two" id='three' id>' );
$p->next_tag();
$p->remove_attribute( 'id' );
$p->get_updated_html();
// <br id="two" id='three' id>

After

<?php
$p = new WP_HTML_Tag_Processor( '<br id=one id="two" id='three' id>' );
$p->next_tag();
$p->remove_attribute( 'id' );
$p->get_updated_html();
// <br>

Previously we have been overlooking duplicate attributes since they don't have an impact on what parses into the DOM. However, as one unit test affirmed (asserting the presence of the bug in the tag processor) when removing an attribute where duplicates exist this meant we ended up changing the value of an attribute instead of removing it.

In this patch we're tracking the text spans of the parsed duplicate attributes so that if we attempt to remove them then we'll have the appropriate information necessary to do so. When an attribute isn't removed we'll simply forget about the tracked duplicates. This involves some overhead for normal operation when in fact there are duplicate attributes on a tag, but that overhead is minimal in the form of integer pairs of indices for each duplicated attribute.

Props dmsnell, zieladam.
Merges [56684] to the 6.3 branch.
Fixes #58119.

Note: See TracTickets for help on using tickets.