Make WordPress Core

Opened 6 weeks ago

Last modified 38 hours ago

#63527 reopened enhancement

Tests: Add test assertion to compare HTML for equivalence

Reported by: bernhard-reiter's profile Bernhard Reiter Owned by: bernhard-reiter's profile Bernhard Reiter
Milestone: 6.9 Priority: normal
Severity: normal Version:
Component: Build/Test Tools Keywords: has-patch has-unit-tests
Focuses: Cc:

Description

Comparing HTML markup can be error prone and fragile. Typically we're interested in semantically equivalent HTML; among other things, that means that the comparison should have built-in tolerance for different attribute and class name order.

Ideally, the assertion should also produce output which is easily human-readable when flagging differences between two HTML strings.

Finally, it should be able to understand block delimiters.

Change History (13)

This ticket was mentioned in PR #8882 on WordPress/wordpress-develop by @Bernhard Reiter.


6 weeks ago
#1

  • Keywords has-patch has-unit-tests added

## Proposed changes

Add assertEqualBlockMarkup method to WP_UnitTestClass for tests comparing HTML (potentially including block markup).

Comparing HTML markup can be error prone and fragile. Typically we're interested in _semantically equivalent_ HTML, which is what this assertion allows for; i.e. it has built-in tolerance for different attribute and class name order.

Internally, it builds a deterministic tree string representation of the HTML (using the HTML API) and compares the results.

The format of the tree is inspired by the HTML5lib-tests tree format.. It is extended with a special representation of block delimiters and their attributes.

This format also makes it easier to visually spot the differences between the two strings of HTML markup, as reported by the assertion. Quoting the inline documentation (PHPDoc) that's part of this PR:

For example, consider the following block markup:

    
        <hr class="wp-block-separator is-style-default has-custom-classname" style="margin-top: 50px; margin-bottom: 50px" />
    

This will be represented as:

    BLOCK["core/separator"]
      {
        "backgroundColor": "accent-1",
        "className": "has-custom-classname is-style-default",
        "style": {
          "spacing": {
            "margin": {
              "top": "50px",
              "bottom": "50px"
            }
          }
        }
      }
      <hr>
        class="has-custom-classname is-style-default wp-block-separator"
        style="margin-top:50px;margin-bottom:50px;"

## Note

This PR adds the new assertEqualBlockMarkup method to the WP_UnitTestCase class. Note that class Tests_Dependencies_Scripts, which is a subclass of WP_UnitTestCase, already contains a protected method named assertEqualMarkup. The latter is exclusively used for dependencies/scripts tests. The new assertEqualBlockMarkup cannot be used as a drop-in replacement, due to some differences in what it considers semantically equivalent HTML.

The existence of that protected assertEqualMarkup method was a consideration in naming the new method assertEqualBlockMarkup, so that it wouldn't collide. It seems like a reasonable choice, given that assertEqualBlockMarkup is indeed capable not only of parsing HTML, but also _block_ markup.

In a future iteration, we might choose to retire the existing assertEqualMarkup method (and use assertEqualBlockMarkup for dependencies/scripts tests). This would then also pave the way for a general-purpose assertEqualMarkup method (which is able to parse HTML but agnostic of block markup), should the need arise.

Props @sirreal @dmsnell
Trac ticket: https://core.trac.wordpress.org/ticket/63527

@Bernhard Reiter commented on PR #8882:


6 weeks ago
#2

Love the idea. Have you tried testing to see if this can be a replacement for assertEqualMarkup?

I would like it to be eventually, but I couldn't use it as a drop-in replacement. (There's more on that in the PR description, under "Note".) As to the nature of the discrepancies: AFAIR, it was mostly that assertEqualMarkup removed <[CDATA while assertEqualBlockMarkup (i.e. the HTML API) kept it; plus boolean attributes such as async and defer on <script> tag. (IIRC, WP's own util functions -- i.e. enqueue/register script -- use the old async="async" notation, whereas the HTML API uses the HTML5 style -- i.e. just async.)

My takeaway was that it would mean a bit more work to sort out the ~50 (?) test failures I was getting when I tried this, and I didn't want to block the PR by that. Eventually, I'm all for retiring scripts.php's assertEqualMarkup in favor of an HTML API-based assertion, though.

@jonsurrell commented on PR #8882:


6 weeks ago
#3

I've pushed some changes to completely remove the previous assertEqualMarkup with this implementation.

The work discovered at least one bug in the tests where the previous implementation "normalized" the output and forced the markup to _appear_ equivalent when, in face, it was not. In this case, it purported to test html5 script behavior, but did not set html5 theme support.

https://github.com/WordPress/wordpress-develop/blob/b0df96a0d6cec2bceb5b4e16817f71b05fee70e0/tests/phpunit/tests/dependencies/scripts.php#L1531-L1534

There are a number of changes where /* <![CDATA[ */ /* ]]> */ wrappers are introduced around script tag contents or attributes were produced like defer="defer" instead of booleans.

It's clear that this implementation of assertEqualMarkup provides a much more accurate representation of the markup.

---

There's a conversation to be had around HTML5 support and whether the non HTML5 behaviors even make sense today, but that's beyond the scope of this PR.

@jonsurrell commented on PR #8882:


6 weeks ago
#4

All of the tests here for IE conditional comments also seem quite out of place in today's internet.

#5 @jorbin
6 weeks ago

  • Component changed from General to Build/Test Tools
  • Milestone changed from Awaiting Review to 6.9

@Bernhard Reiter commented on PR #8882:


5 weeks ago
#6

It's worth mentioning that the tree builder functionality is largely duplicated from existing html5lib tests used for the HTML API. It's already seen significant usage, although the block parsing is new.

@sirreal Think it's worth doing a follow-up to share code between the two?

@jonsurrell commented on PR #8882:


5 weeks ago
#7

I only mentioned it because it seemed relevant for other reviewers. We could try out the sharing but I don't think it's necessary.

#8 @Bernhard Reiter
5 weeks ago

  • Owner set to Bernhard Reiter
  • Resolution set to fixed
  • Status changed from new to closed

In 60295:

Tests: Add new assertEqualHTML assertion.

Add a new assertEqualHTML method to WP_UnitTestClass for tests comparing HTML (potentially including block markup).

Internally, the assertion builds a deterministic tree string representation of the markup (using the HTML API) and compares the results. The format of the tree is inspired by the HTML5lib-tests tree format. It is extended with a special representation of block delimiters and their attributes. This format also makes it easier to visually spot the differences between the two strings if the assertion fails.

Finally, this changeset updates Tests_Dependencies_Scripts to remove its assertEqualMarkup and parse_markup_fragment methods, and to use the newly introduced assertEqualHTML instead.

Props bernhard-reiter, jonsurrell, dmsnell, jorbin, gziolo.
Fixes #63527.

#9 follow-up: @SirLouen
5 weeks ago

@bernhard-reiter
I wondered why CDATA wrapper couldn't be dealt as it was being in parse_markup_fragment for a cleaner markup?

Last edited 41 hours ago by jonsurrell (previous) (diff)

#10 in reply to: ↑ 9 @jonsurrell
41 hours ago

Replying to SirLouen:

I wondered why CDATA wrapper couldn't be dealt as it was being in parse_markup_fragment for a cleaner markup?

Are you asking about the behavior of assertEqualHTML in general or specifically for the script tests that were modified in [60295]?

Speaking generally, the goal with assertEqualHTML is to compare HTML inputs and present differences in a way that's easy for humans to parse. It should what the input actually mean when processed (like a browser would). To that end, it's important that it faithfully represent the inputs so it can be trusted in tests.

Specifically for the modified tests, this is stylistic to some extent, but even here I believe there's value in faithfully representing the expectations. Before, the input was parsed as XML and some string replacement was used to remove the CDATA comments before comparing. On its own, this isn't likely harmful but it does hide some of what's actually happening. This ties in nicely with #59883 and non-HTML5 support.

Last edited 41 hours ago by jonsurrell (previous) (diff)

This ticket was mentioned in PR #9273 on WordPress/wordpress-develop by @jonsurrell.


39 hours ago
#11

assertEqualHTML catches Exceptions and attempts to re-parse the HTML with the Dom\HtmlDocument class.

That does not always work correctly because the tree builder function throws Errors, for example:

https://github.com/WordPress/wordpress-develop/blob/b4418472120158fd9a802c7343f7dbffa31c9862/tests/phpunit/includes/build-visual-html-tree.php#L54

These do not appear to be caught where the assertion catches Exceptions:

https://github.com/WordPress/wordpress-develop/blob/b4418472120158fd9a802c7343f7dbffa31c9862/tests/phpunit/includes/abstract-testcase.php#L1219-L1221

Trac ticket: https://core.trac.wordpress.org/ticket/63527
Follow-up to [60295]

#12 @jonsurrell
39 hours ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

I'm reopening this for PR 9273 that ensures the Dom\HtmlDocument fallback is triggered in case there are issues building the original tree representation.

@dmsnell commented on PR #9273:


38 hours ago
#13

Did you consider relaxing the check in assertEqualHTML() to catch Throwable or both kinds of errors/exceptions? I’m constantly confused in PHP and Python which failures inherit from which, which is the base, etc…

For what it’s worth, I can see some distinction to be made, specifically for incomplete support, because I had test failures when working on the wp_rel_ugc() refactor where I had mistakenly autocompleted WP_HTML_Processor instead of WP_HTML_Tag_Processor. The notice “Could not create a parser” led me right to the problem.

Note: See TracTickets for help on using tickets.