Opened 5 months ago
Closed 2 months ago
#63527 closed enhancement (fixed)
Tests: Add test assertion to compare HTML for equivalence
| Reported by: |
|
Owned by: |
|
|---|---|---|---|
| Milestone: | 6.9 | Priority: | normal |
| Severity: | normal | Version: | |
| Component: | Build/Test Tools | Keywords: | has-patch has-unit-tests |
| Focuses: | Cc: |
Description
Comparing HTML markup can be error prone and fragile. Typically we're interested in semantically equivalent HTML; among other things, that means that the comparison should have built-in tolerance for different attribute and class name order.
Ideally, the assertion should also produce output which is easily human-readable when flagging differences between two HTML strings.
Finally, it should be able to understand block delimiters.
Change History (16)
This ticket was mentioned in PR #8882 on WordPress/wordpress-develop by @Bernhard Reiter.
5 months ago
#1
- Keywords has-patch has-unit-tests added
@Bernhard Reiter commented on PR #8882:
5 months ago
#2
Love the idea. Have you tried testing to see if this can be a replacement for
assertEqualMarkup?
I would like it to be eventually, but I couldn't use it as a drop-in replacement. (There's more on that in the PR description, under "Note".) As to the nature of the discrepancies: AFAIR, it was mostly that assertEqualMarkup removed <[CDATA while assertEqualBlockMarkup (i.e. the HTML API) kept it; plus boolean attributes such as async and defer on <script> tag. (IIRC, WP's own util functions -- i.e. enqueue/register script -- use the old async="async" notation, whereas the HTML API uses the HTML5 style -- i.e. just async.)
My takeaway was that it would mean a bit more work to sort out the ~50 (?) test failures I was getting when I tried this, and I didn't want to block the PR by that. Eventually, I'm all for retiring scripts.php's assertEqualMarkup in favor of an HTML API-based assertion, though.
@jonsurrell commented on PR #8882:
5 months ago
#3
I've pushed some changes to completely remove the previous assertEqualMarkup with this implementation.
The work discovered at least one bug in the tests where the previous implementation "normalized" the output and forced the markup to _appear_ equivalent when, in face, it was not. In this case, it purported to test html5 script behavior, but did not set html5 theme support.
There are a number of changes where /* <![CDATA[ */ /* ]]> */ wrappers are introduced around script tag contents or attributes were produced like defer="defer" instead of booleans.
It's clear that this implementation of assertEqualMarkup provides a much more accurate representation of the markup.
---
There's a conversation to be had around HTML5 support and whether the non HTML5 behaviors even make sense today, but that's beyond the scope of this PR.
@jonsurrell commented on PR #8882:
5 months ago
#4
All of the tests here for IE conditional comments also seem quite out of place in today's internet.
#5
@
5 months ago
- Component changed from General to Build/Test Tools
- Milestone changed from Awaiting Review to 6.9
@Bernhard Reiter commented on PR #8882:
5 months ago
#6
It's worth mentioning that the tree builder functionality is largely duplicated from existing html5lib tests used for the HTML API. It's already seen significant usage, although the block parsing is new.
@sirreal Think it's worth doing a follow-up to share code between the two?
@jonsurrell commented on PR #8882:
5 months ago
#7
I only mentioned it because it seemed relevant for other reviewers. We could try out the sharing but I don't think it's necessary.
#8
@
5 months ago
- Owner set to Bernhard Reiter
- Resolution set to fixed
- Status changed from new to closed
In 60295:
#9
follow-up:
↓ 10
@
5 months ago
@bernhard-reiter
I wondered why CDATA wrapper couldn't be dealt as it was being in parse_markup_fragment for a cleaner markup?
#10
in reply to:
↑ 9
@
4 months ago
Replying to SirLouen:
I wondered why
CDATAwrapper couldn't be dealt as it was being inparse_markup_fragmentfor a cleaner markup?
Are you asking about the behavior of assertEqualHTML in general or specifically for the script tests that were modified in [60295]?
Speaking generally, the goal with assertEqualHTML is to compare HTML inputs and present differences in a way that's easy for humans to parse. It should what the input actually mean when processed (like a browser would). To that end, it's important that it faithfully represent the inputs so it can be trusted in tests.
Specifically for the modified tests, this is stylistic to some extent, but even here I believe there's value in faithfully representing the expectations. Before, the input was parsed as XML and some string replacement was used to remove the CDATA comments before comparing. On its own, this isn't likely harmful but it does hide some of what's actually happening. This ties in nicely with #59883 and non-HTML5 support.
This ticket was mentioned in PR #9273 on WordPress/wordpress-develop by @jonsurrell.
4 months ago
#11
assertEqualHTML catches Exceptions and attempts to re-parse the HTML with the Dom\HtmlDocument class.
That does not always work correctly because the tree builder function throws Errors, for example:
These do not appear to be caught where the assertion catches Exceptions:
Trac ticket: https://core.trac.wordpress.org/ticket/63527
Follow-up to [60295]
#12
@
4 months ago
- Resolution fixed deleted
- Status changed from closed to reopened
I'm reopening this for PR 9273 that ensures the Dom\HtmlDocument fallback is triggered in case there are issues building the original tree representation.
4 months ago
#13
Did you consider relaxing the check in assertEqualHTML() to catch Throwable or both kinds of errors/exceptions? I’m constantly confused in PHP and Python which failures inherit from which, which is the base, etc…
For what it’s worth, I can see some distinction to be made, specifically for incomplete support, because I had test failures when working on the wp_rel_ugc() refactor where I had mistakenly autocompleted WP_HTML_Processor instead of WP_HTML_Tag_Processor. The notice “Could not create a parser” led me right to the problem.
@Bernhard Reiter commented on PR #9273:
4 months ago
#14
Thank you for the fix!
For what it’s worth, I can see some distinction to be made, specifically for incomplete support, because I had test failures when working on the
wp_rel_ugc()refactor where I had mistakenly autocompletedWP_HTML_Processorinstead ofWP_HTML_Tag_Processor. The notice “Could not create a parser” led me right to the problem.
@dmsnell Are you saying you're in favor of continuing to throw Errors as we presently do in order to tell those cases apart from a different class of failures (including the unsupported exceptions)? That's also fine by me.
@sirreal What was the original reason to choose Error over Exception? Was it to denote the unrecoverable nature of those failures? If so, it might make sense to keep this semantic nuance.
In any case, I don't have a strong preference. I'm fine with this fix, but also fine with changing the assertion to catch ( Throwable ) instead.
@jonsurrell commented on PR #9273:
4 months ago
#15
I agree, there are likely cases that are interesting to differentiate between different types of errors and whether they're recoverable (like unsupported parsing) or not.
What was the original reason to choose
ErroroverException?
No reason, they probably should have been Exception. Documentation says:
Error is the base class for all internal PHP errors.
Exception is the base class for all user exceptions.
## Proposed changes
Add
assertEqualBlockMarkupmethod toWP_UnitTestClassfor tests comparing HTML (potentially including block markup).Comparing HTML markup can be error prone and fragile. Typically we're interested in _semantically equivalent_ HTML, which is what this assertion allows for; i.e. it has built-in tolerance for different attribute and class name order.
Internally, it builds a deterministic tree string representation of the HTML (using the HTML API) and compares the results.
The format of the tree is inspired by the HTML5lib-tests tree format.. It is extended with a special representation of block delimiters and their attributes.
This format also makes it easier to visually spot the differences between the two strings of HTML markup, as reported by the assertion. Quoting the inline documentation (PHPDoc) that's part of this PR:
For example, consider the following block markup: <hr class="wp-block-separator is-style-default has-custom-classname" style="margin-top: 50px; margin-bottom: 50px" /> This will be represented as: BLOCK["core/separator"] { "backgroundColor": "accent-1", "className": "has-custom-classname is-style-default", "style": { "spacing": { "margin": { "top": "50px", "bottom": "50px" } } } } <hr> class="has-custom-classname is-style-default wp-block-separator" style="margin-top:50px;margin-bottom:50px;"## Note
This PR adds the new
assertEqualBlockMarkupmethod to theWP_UnitTestCaseclass. Note thatclass Tests_Dependencies_Scripts, which is a subclass ofWP_UnitTestCase, already contains aprotectedmethod namedassertEqualMarkup. The latter is exclusively used for dependencies/scripts tests. The newassertEqualBlockMarkupcannot be used as a drop-in replacement, due to some differences in what it considers semantically equivalent HTML.The existence of that protected
assertEqualMarkupmethod was a consideration in naming the new methodassertEqualBlockMarkup, so that it wouldn't collide. It seems like a reasonable choice, given thatassertEqualBlockMarkupis indeed capable not only of parsing HTML, but also _block_ markup.In a future iteration, we might choose to retire the existing
assertEqualMarkupmethod (and useassertEqualBlockMarkupfor dependencies/scripts tests). This would then also pave the way for a general-purposeassertEqualMarkupmethod (which is able to parse HTML but agnostic of block markup), should the need arise.Props @sirreal @dmsnell
Trac ticket: https://core.trac.wordpress.org/ticket/63527