Opened 6 weeks ago
Last modified 38 hours ago
#63527 reopened enhancement
Tests: Add test assertion to compare HTML for equivalence
Reported by: |
|
Owned by: |
|
---|---|---|---|
Milestone: | 6.9 | Priority: | normal |
Severity: | normal | Version: | |
Component: | Build/Test Tools | Keywords: | has-patch has-unit-tests |
Focuses: | Cc: |
Description
Comparing HTML markup can be error prone and fragile. Typically we're interested in semantically equivalent HTML; among other things, that means that the comparison should have built-in tolerance for different attribute and class name order.
Ideally, the assertion should also produce output which is easily human-readable when flagging differences between two HTML strings.
Finally, it should be able to understand block delimiters.
Change History (13)
This ticket was mentioned in PR #8882 on WordPress/wordpress-develop by @Bernhard Reiter.
6 weeks ago
#1
- Keywords has-patch has-unit-tests added
@Bernhard Reiter commented on PR #8882:
6 weeks ago
#2
Love the idea. Have you tried testing to see if this can be a replacement for
assertEqualMarkup
?
I would like it to be eventually, but I couldn't use it as a drop-in replacement. (There's more on that in the PR description, under "Note".) As to the nature of the discrepancies: AFAIR, it was mostly that assertEqualMarkup
removed <[CDATA
while assertEqualBlockMarkup
(i.e. the HTML API) kept it; plus boolean attributes such as async
and defer
on <script>
tag. (IIRC, WP's own util functions -- i.e. enqueue/register script -- use the old async="async"
notation, whereas the HTML API uses the HTML5 style -- i.e. just async
.)
My takeaway was that it would mean a bit more work to sort out the ~50 (?) test failures I was getting when I tried this, and I didn't want to block the PR by that. Eventually, I'm all for retiring scripts.php
's assertEqualMarkup
in favor of an HTML API-based assertion, though.
@jonsurrell commented on PR #8882:
6 weeks ago
#3
I've pushed some changes to completely remove the previous assertEqualMarkup
with this implementation.
The work discovered at least one bug in the tests where the previous implementation "normalized" the output and forced the markup to _appear_ equivalent when, in face, it was not. In this case, it purported to test html5 script behavior, but did not set html5 theme support.
There are a number of changes where /* <![CDATA[ */ /* ]]> */
wrappers are introduced around script tag contents or attributes were produced like defer="defer"
instead of booleans.
It's clear that this implementation of assertEqualMarkup
provides a much more accurate representation of the markup.
---
There's a conversation to be had around HTML5 support and whether the non HTML5 behaviors even make sense today, but that's beyond the scope of this PR.
@jonsurrell commented on PR #8882:
6 weeks ago
#4
All of the tests here for IE conditional comments also seem quite out of place in today's internet.
#5
@
6 weeks ago
- Component changed from General to Build/Test Tools
- Milestone changed from Awaiting Review to 6.9
@Bernhard Reiter commented on PR #8882:
5 weeks ago
#6
It's worth mentioning that the tree builder functionality is largely duplicated from existing html5lib tests used for the HTML API. It's already seen significant usage, although the block parsing is new.
@sirreal Think it's worth doing a follow-up to share code between the two?
@jonsurrell commented on PR #8882:
5 weeks ago
#7
I only mentioned it because it seemed relevant for other reviewers. We could try out the sharing but I don't think it's necessary.
#8
@
5 weeks ago
- Owner set to Bernhard Reiter
- Resolution set to fixed
- Status changed from new to closed
In 60295:
#9
follow-up:
↓ 10
@
5 weeks ago
@bernhard-reiter
I wondered why CDATA
wrapper couldn't be dealt as it was being in parse_markup_fragment
for a cleaner markup?
#10
in reply to:
↑ 9
@
41 hours ago
Replying to SirLouen:
I wondered why
CDATA
wrapper couldn't be dealt as it was being inparse_markup_fragment
for a cleaner markup?
Are you asking about the behavior of assertEqualHTML
in general or specifically for the script tests that were modified in [60295]?
Speaking generally, the goal with assertEqualHTML
is to compare HTML inputs and present differences in a way that's easy for humans to parse. It should what the input actually mean when processed (like a browser would). To that end, it's important that it faithfully represent the inputs so it can be trusted in tests.
Specifically for the modified tests, this is stylistic to some extent, but even here I believe there's value in faithfully representing the expectations. Before, the input was parsed as XML and some string replacement was used to remove the CDATA comments before comparing. On its own, this isn't likely harmful but it does hide some of what's actually happening. This ties in nicely with #59883 and non-HTML5 support.
This ticket was mentioned in PR #9273 on WordPress/wordpress-develop by @jonsurrell.
39 hours ago
#11
assertEqualHTML
catches Exceptions and attempts to re-parse the HTML with the Dom\HtmlDocument
class.
That does not always work correctly because the tree builder function throws Error
s, for example:
These do not appear to be caught where the assertion catches Exception
s:
Trac ticket: https://core.trac.wordpress.org/ticket/63527
Follow-up to [60295]
#12
@
39 hours ago
- Resolution fixed deleted
- Status changed from closed to reopened
I'm reopening this for PR 9273 that ensures the Dom\HtmlDocument
fallback is triggered in case there are issues building the original tree representation.
38 hours ago
#13
Did you consider relaxing the check in assertEqualHTML()
to catch Throwable
or both kinds of errors/exceptions? I’m constantly confused in PHP and Python which failures inherit from which, which is the base, etc…
For what it’s worth, I can see some distinction to be made, specifically for incomplete support, because I had test failures when working on the wp_rel_ugc()
refactor where I had mistakenly autocompleted WP_HTML_Processor
instead of WP_HTML_Tag_Processor
. The notice “Could not create a parser” led me right to the problem.
## Proposed changes
Add
assertEqualBlockMarkup
method toWP_UnitTestClass
for tests comparing HTML (potentially including block markup).Comparing HTML markup can be error prone and fragile. Typically we're interested in _semantically equivalent_ HTML, which is what this assertion allows for; i.e. it has built-in tolerance for different attribute and class name order.
Internally, it builds a deterministic tree string representation of the HTML (using the HTML API) and compares the results.
The format of the tree is inspired by the HTML5lib-tests tree format.. It is extended with a special representation of block delimiters and their attributes.
This format also makes it easier to visually spot the differences between the two strings of HTML markup, as reported by the assertion. Quoting the inline documentation (PHPDoc) that's part of this PR:
## Note
This PR adds the new
assertEqualBlockMarkup
method to theWP_UnitTestCase
class. Note thatclass Tests_Dependencies_Scripts
, which is a subclass ofWP_UnitTestCase
, already contains aprotected
method namedassertEqualMarkup
. The latter is exclusively used for dependencies/scripts tests. The newassertEqualBlockMarkup
cannot be used as a drop-in replacement, due to some differences in what it considers semantically equivalent HTML.The existence of that protected
assertEqualMarkup
method was a consideration in naming the new methodassertEqualBlockMarkup
, so that it wouldn't collide. It seems like a reasonable choice, given thatassertEqualBlockMarkup
is indeed capable not only of parsing HTML, but also _block_ markup.In a future iteration, we might choose to retire the existing
assertEqualMarkup
method (and useassertEqualBlockMarkup
for dependencies/scripts tests). This would then also pave the way for a general-purposeassertEqualMarkup
method (which is able to parse HTML but agnostic of block markup), should the need arise.Props @sirreal @dmsnell
Trac ticket: https://core.trac.wordpress.org/ticket/63527