Make WordPress Core

Opened 9 months ago

Closed 6 months ago

Last modified 6 months ago

#60227 closed task (blessed) (fixed)

HTML API: Add external test suite

Reported by: jonsurrell's profile jonsurrell Owned by: dmsnell's profile dmsnell
Milestone: 6.6 Priority: normal
Severity: normal Version:
Component: Build/Test Tools Keywords: has-patch has-unit-tests
Focuses: Cc:

Description

Problem:

The HTML API is a complex and important feature being implemented by a small group of developers. The implementation follows the HTML Standard โ€‹https://html.spec.whatwg.org/.

The tests and the implementation of the HTML API both depend on the interpretation of the standard by the same small group of developers. This means there is a risk of misinterpretation or overlooked edge cases.

Needs:

A suite of tests that can serve to test the HTML API that does not depend on our own interpretation of the HTML Standard and is implementation agnostic.

Benefits:

Many software projects such as browsers implement the HTML Standard. We should be able to use a third-party implementation-agnostic test suite to validate our implementation of the HTML API. This mitigates the risk where our implementation and tests all derive from the same interpretation of the specification.

Change History (41)

This ticket was mentioned in โ€‹Slack in #core by jonsurrell. โ€‹View the logs.


9 months ago

This ticket was mentioned in โ€‹PR #5794 on โ€‹WordPress/wordpress-develop by โ€‹@jonsurrell.


9 months ago
#2

  • Keywords has-patch has-unit-tests added

Add a suite of tests for the HTML API taken from โ€‹https://github.com/html5lib/html5lib-tests/tree/a9f44960a9fedf265093d22b2aa3c7ca123727b9

Only the tree construction tests are taken. This suite of tests is also used by the โ€‹servo project to test its โ€‹html5ever package.

In this PR we extract relevant test data from html5lib-tests and run them against the HTML Processor. We transform the HTML Processor output to match the expected tree shape from html5lib-tests. We also explicitly skip some tests based on their test data names. Test data names look like tests20/case39 - line 497 where the form is:

test_file "/case" test_index " - line " test_line_number

Tests can be skipped by adding them to the SKIP_TEST array. Tests are easy to filter by running e.g. phpunit --filter tests1/case1.

TODO

  • [x] Move test data to tests/phpunit/data

Trac ticket: https://core.trac.wordpress.org/ticket/60227

This ticket was mentioned in โ€‹Slack in #core by hellofromtonya. โ€‹View the logs.


9 months ago

#4 @jonsurrell
9 months ago

My proposed test suite addition adds roughly 60 files and 20,000+ lines of test fixtures. That type of change seems exceptional, so I'm seeking ample feedback.

This was mentioned on the devchat: โ€‹https://make.wordpress.org/core/2024/01/10/dev-chat-agenda-january-10-2024/#comment-46068

I'll mention the Build/Test Tools component maintainers (โ€‹https://make.wordpress.org/core/components/build-test-tools/) because I believe this falls under their purview:

@jorbin @desrosj @netweb @sergeybiryukov @johnbillion @hellofromtonya

If no concerns are raised, I hope to move ahead with this work at some point next week.

(test ping @dmsnell)

Version 2, edited 9 months ago by jonsurrell (previous) (next) (diff)

#5 @jonsurrell
9 months ago

My proposed test suite addition adds roughly 60 files and 20,000+ lines of test fixtures. That type of change seems exceptional, so I'm seeking additional feedback.

This was mentioned on the devchat: โ€‹https://make.wordpress.org/core/2024/01/10/dev-chat-agenda-january-10-2024/#comment-46068

I'll mention the Build/Test Tools component maintainers (โ€‹https://make.wordpress.org/core/components/build-test-tools/) because I believe this falls under their purview:

@jorbin @desrosj @netweb @sergeybiryukov @johnbillion @hellofromtonya

If no concerns are raised, I hope to move ahead with this work at some point this week.

โ€‹@jonsurrell commented on โ€‹PR #5794:


9 months ago
#6

  • The suite is MIT licensed. Is that OK to use?

I'm confident at this point that this license is compatible. The โ€‹GNU page on compatible license mentions it (although they call it Expat).

Additionally, jQuery and React are both MIT licensed, and WordPress uses them.

โ€‹@jonsurrell commented on โ€‹PR #5794:


9 months ago
#7

This is ready for review.

#8 follow-ups: @jorbin
9 months ago

This is an intrigruing idea. I have a few large concerns/things I think need to be resolved and some smaller ones:

  1. How do we keep this updated? Could this be set as an SVN external or be included in some other way to ensure it stays updated besides a person manually checking if there are new commits?
  2. Do we have any idea what the long term plans are for this project? I see that the majority of the projects from this github organization seem to be inactive or abandon. Additionally, I don't see any sort of code of conduct, do we know if this project is one who's values align with WordPress?
  3. What's the reasoning besides skipping the tests that don't expect empty head? This is leading to 500+ skipped tests.
  4. It would be good to get the tests passing and the coding standard issues resolved before doing a full review
  5. nitpic, but the naming feels a bit cumbersome. Tests_HtmlApi_WpHtmlProcessorHtml5lib::test_external_html5lib just feels long.

#9 in reply to: โ†‘ย 8 @jonsurrell
9 months ago

Replying to jorbin:

Thanks for the feedback!

  1. How do we keep this updated? Could this be set as an SVN external or be included in some other way to ensure it stays updated besides a person manually checking if there are new commits?

I thought about externals, but html5lib-tests being hosted on GitHub I don't know how that's possible with just basic tooling. โ€‹GitHub used to support subversion but recently sunset it.

Staying up to date by pulling in changes seems ideal, but it's not essential. These are test cases for a standard. The standard may evolve, but it shouldn't evolve in a way that invalidates the tests. The commits over the last two years are infrequent and seem to be mostly adding some tests, adding some parse errors (which we ignore at this time), and some fixes for things like test duplication and typos.

In short, I don't think it's problematic if we commit the current state of html5lib-tests and never update it. We still get a lot of value from this set of tests.

  1. Do we have any idea what the long term plans are for this project? I see that the majority of the projects from this github organization seem to be inactive or abandon.

No I don't know. The abandoned projects were ports of html5lib-python to other languages. As far as I can tell html5lib-python is popular and still being maintained. I'm not familiar enough with the python ecosystem.

Beautiful Soup is a project that uses html5lib that rings a bell: "โ€‹Beautiful Soup sits on top of popular Python parsers like lxml and html5lib, allowing you to try out different parsing strategies or trade speed for flexibility."

I do know that โ€‹html5ever is a popular HTML5 parser in Rust (part of the โ€‹servo project). html5ever uses html5lib-tests for testing.

Again, with a relatively stable standard and stable set of tests, we can still get a lot of value from the tests even if they're abandoned today.

Additionally, I don't see any sort of code of conduct, do we know if this project is one who's values align with WordPress?

I can't speak to that. As far as I can tell this is a focused software project without much of a community. The package is developed to provide a specific functionality and not much more.

  1. What's the reasoning besides skipping the tests that don't expect empty head? This is leading to 500+ skipped tests.

The HTML API is not complete and spec compliant yet. One of the motivations for adding this tests is to allow us to speed up development by increasing confidence.

The head tests specifically are skipped because the HTML API only works in "in body" parsing mode. The context of the HTML provided should be in the body tag for it to be supported. We skip any tests that expect to find anything outside the <body>

  1. It would be good to get the tests passing and the coding standard issues resolved before doing a full review

I'll get all the code standards cleaned up. I believe I left some failing tests that detect existing bugs in the HTML API. Those should be addressed soon.

  1. nitpic, but the naming feels a bit cumbersome. Tests_HtmlApi_WpHtmlProcessorHtml5lib::test_external_html5lib just feels long.

Agreed :) I'm not attached to the naming and happy to change it.

โ€‹@jonsurrell commented on โ€‹PR #5794:


9 months ago
#10

naming feels a bit cumbersome. Tests_HtmlApi_WpHtmlProcessorHtml5lib::test_external_html5lib just feels long.

I've renamed the class and test method to Tests_HtmlApi_Html5lib::test_parse.

โ€‹@jonsurrell commented on โ€‹PR #5794:


9 months ago
#11

I've also skipped the remaining failing test, I clearly marked it as a bug in its skip message.

#12 in reply to: โ†‘ย 8 ; follow-up: @azaozz
9 months ago

  • Component changed from HTML API to Build/Test Tools
  • Milestone changed from Awaiting Review to 6.5

Replying to jorbin:

This is an intrigruing idea.

Yep, I agree, really intriguing. May open the door to use other external tests where possible.

  1. How do we keep this updated...
  2. Do we have any idea what the long term plans are for this project...

Imho it would probably be best to "fork" (copy/paste) the tests despite that it would require manual updating from time to time. That's not such a big problem, most of the external (PHP) code/libraries in WP are updated manually. (Using the term "fork" here as the tests doesn't seem to have been released independently since 2012.)

Adding to the 6.5 milestone for consideration. Also changing the component to Build/Test Tools as it seems perhaps more appropriate.

Last edited 9 months ago by azaozz (previous) (diff)

#13 in reply to: โ†‘ย 12 @jonsurrell
9 months ago

Replying to azaozz:

Imho it would probably be best to "fork" (copy/paste) the tests despite that it would require manual updating from time to time.

That's the approach I've taken in โ€‹PR 5794. It's nice and simple. I've included a README file that links to the original repo and mentions the commit sha that was used.

Adding to the 6.5 milestone for consideration. Also changing the component to Build/Test Tools as it seems perhaps more appropriate.

๐Ÿ‘ Thanks!

#14 @jonsurrell
8 months ago

Are there remaining concerns about moving ahead with html5lib-tests as proposed in โ€‹the linked PR?

If broad or general concerns about adding this third-party test suite are satisfied, I'd like to move into code review of the PR.

โ€‹@jonsurrell commented on โ€‹PR #5794:


8 months ago
#15

Currently failing tests with added scan-all-tokens testing:

  • [ ] tests1/line0024
  • [ ] tests1/line0720
  • [ ] tests1/line0833
  • [ ] tests21/line0025
  • [ ] webkit01/line0161
  • [ ] webkit01/line0174
  • [ ] webkit01/line0206
  • [ ] tests2/line0317
  • [ ] tests2/line0408
  • [ ] tests2/line0650
  • [ ] tests25/line0169
  • [ ] plain-text-unsafe/line0001
  • [ ] plain-text-unsafe/line0105
  • [ ] tests8/line0001
  • [ ] tests8/line0020
  • [ ] tests8/line0037
  • [ ] tests8/line0052

This ticket was mentioned in โ€‹PR #5976 on โ€‹WordPress/wordpress-develop by โ€‹@jonsurrell.


8 months ago
#16

This only includes a failing test at the moment.

Trac ticket: https://core.trac.wordpress.org/ticket/60227

โ€‹@jonsurrell commented on โ€‹PR #5976:


8 months ago
#17

@dmsnell Please take a look at this.

โ€‹@jonsurrell commented on โ€‹PR #5976:


8 months ago
#19

I wanted to re-examine the rules and fallback to the existing "failure to find a tag" but I think having this up-front has its merit.

My first thought was that we should find the text node once we've found the start of the next node, but that seemed like a much larger refactor. What I don't like about this approach is that we're effectively identifying the start of the next node in multiple places.

I think it's fine for now, this is a smaller change with the 6.5 release on the horizon and as test coverage increases we can explore larger changes before the next release.

Did you examine trapping this inside of next_tag() in the case that parse_next_tag() returns false?

I did not.

My one big remaining question is whether we expect this to be a normal case or a rare case.

I don't have data to answer that. My gut suggests this is unusual, but I have no idea.

That maybe would suggest whether to eagerly re-compute the next character, as we do in this patch, or put it at the end of the loop in parse_next_tag(). We can adjust as we see fit.

I think you're describing the approach I'd like, but again I think that would be a larger refactor and perhaps best left for the next release cycle.

โ€‹@jonsurrell commented on โ€‹PR #5794:


8 months ago
#20

@dmsnell and I looked at a test case where we fail to reach a text node at the end of the document in these tests. The text node seems to be reached, but its token type is null. I think if we get that fixed most of the test failures will be fixed ๐ŸŽ‰

#21 @jonsurrell
8 months ago

@jorbin Will you give this another review when possible? I'd like to know if your concerns have been addressed.

#22 @hellofromTonya
8 months ago

What's the reasoning besides skipping the tests that don't expect empty head? This is leading to 500+ skipped tests.

I share @jorbin's concern about the amount of test being skipped.

Copying โ€‹my comment from the GitHub PR here, as inline/nested code comments do not get posted to the ticket:

I'm concerned about the amount of tests being skipped.

IMO skipping a test is informative to denote where work is still needed or planned for the future (such as skipping until a dependency or bug fix lands).

Here <head> is not supported. Thus, these skips do not add value. Instead, they will consume time and resources, adding more messages to shift through and adding more time to the test suite (performance).

Instead of skipping over the <head> tests, is there a way to start the tests at the <body>, thus passing only the body's tests in the data provider's datasets?

While these tests add value to the HTML API, the skipped tests messages pollute the test suite's results, making it harder for contributors (of other tests and code beyond the HTML API) to scan through the results to find relevant information. Plus, these skips also consume resources such as adding processing time to the test suite.

Is there a way to start these tests at the <body>, thus serving supported datasets from the data provider?

#23 follow-ups: @dmsnell
8 months ago

One of the challenges here is that these tests cover the entire HTML5 specification. Now this is important because we don't want WordPress to miss something in the specification and then go down a trail of broken parsing: this is why we created the HTML API, so we can put behind us the days of ad-hoc HTML parsing.

The skipped tests in this case are very much a @todo comment that will shrink as continue developing the HTML API, whereas full support is the trajectory with or without these tests.

One option we have is to manually sift through the thousands of tests and break them up into pieces, committing one at a time as we add support, but I hope everyone recognizes the undue hassle that would imply. This is an external library being vendored in, we should have the full set or none at all so that it remains simple to update from the source.

That being said, it sounds like the main complaint is people thinking that skipped tests implies they should be doing more work on something they are unfamiliar with? On this point would it be something you feel better about if the skipped tests made a void assertion and returned from the test so that it looks like it passes when in fact it's being skipped?

For those of us working on the HTML API that's marginally less valuable because it removes the ease of gauging progress from test completion, but we can always comment out the lines or change them to see the more accurate picture; we already have to do this with all the @covers annotations which obscure the true coverage picture.

#24 in reply to: โ†‘ย 23 @hellofromTonya
8 months ago

Replying to dmsnell:

That being said, it sounds like the main complaint is people thinking that skipped tests implies they should be doing more work on something they are unfamiliar with?

From me, no this isn't the case.

My concerns are:

  • Impacts to the test suite performance.

    these skips also consume resources such as adding processing time to the test suite.

Efforts have been and are underway to make the test suite faster and use less resources. The more non-actionable tasks its does (such as skipping tests), the slower it runs and more the resources it uses.

  • Harder for those not working on the HTML API to parse the results.

the skipped tests messages pollute the test suite's results, making it harder for contributors (of other tests and code beyond the HTML API) to scan through the results to find relevant information.

While I appreciate the intent and impact to those working on the HTML API, my concerns are on the impacts to Core's test suite and the CI jobs.

Thus, I'm curious:
What is possible to avoid skipping and not running unsupported datasets in the tests?

manually sift through the thousands of tests and break them up into pieces

Hmm, this raises another concern:

What is the impact to Core's test suite runtime and resource consumption?

As I noted previously, an effort has been and is underway to improve the test suite's runtime and performance. While this effort should not block adding more and more code coverage, where possible, the performance of the test suite should also be considered.

So if hundreds or thousands of tests are unsupported and will be skipped, IMO these should not be yielded as a test dataset, i.e. at least not without assessing the impacts to the test suite itself.

#25 @dmsnell
8 months ago

Efforts have been and are underway to make the test suite faster and use less resources.

If this is your concern then we should be examining this with a few releases in the future in mind. These tests should all end up running, so the skipped tests today will be the non-skipped tests in a few months hopefully. That is to say, if you unhappy with how much you measured that these tests slow down the unit test suite, it's more likely a value statement about whether the tests are useful, because even if we prune these skipped tests today, we'll be having this same conversation over the next few months as we add them in.

If need be they could certainly be run in a separate step from the other test groups so as to avoid slowing the aggregate runtime.

#26 in reply to: โ†‘ย 23 @jonsurrell
8 months ago

I'll try to justify the skipped tests and address the performance concerns.

dmsnell already mentioned that these tests are not expected to be skipped indefinitely. As more of the HTML API is completed, fewer tests should be skipped. I don't know whether we'll ever reach 0 skips, but if we do reach a point where we have skipped tests we never expect to pass, I'm sure we can find a mechanism to avoid running them.

There are few reasons skipping makes sense at this time.

We do want them to pass eventually.

We can't avoid running the tests in most cases because the HTML API implementation informs the test that it should be skipped. If the HTML API does not support the encountered tags or detects what it determines to be an incomplete HTML fragment (e.g. it ends on an incomplete tag <texta), we don't expect the tests to pass. These aren't failures, they're parts of the API that are incomplete. Because the test is running when we get this information from the HTML API, the test must have a status. We don't want to fail the entire test suite for known and expected limitations. We could let the tests pass, but we lose valuable information about the status of the HTML API overall.

Given this is an external suite of tests, I don't think it makes sense to modify the source in any way.

With all of that in mind, skipping (or marking incomplete) these tests seems like the best option to me.


There were some performance concerns raised regarding the skipped tests. I'd like to provide some data. I've isolated the html5lib-tests suite and done some benchmarking. Over 30 runs, the entire test suite (including all the skipped tests) runs in under 750ms on an M2 MacBook Pro.

Time (mean ยฑ ฯƒ): 722.2 ms ยฑ 8.6 ms [User: 673.6 ms, System: 38.2 ms]
Range (min โ€ฆ max): 706.1 ms โ€ฆ 741.5 ms 30 runs

I also compared the complete phpunit test suite on trunk and the html5lib-tests branch on the same machine. Trunk averaged ~202 seconds and the branch ~203 seconds.

#27 @jorbin
8 months ago

Thank you for running the speed tests. My concerns are more for readability and understandability of the results than speed. Adding 435 incomplete tests and 1109 skipped tests makes it much harder to understand the health of the rest of the tests.

What would folks think of setting these as not being loaded and then run separately as is done with the HTTP remote and ajax tests. I think since these tests will never interact with the Database or work differently with multisite, we can also keep the matrix fairly small.

#28 @jorbin
8 months ago

One additional idea: put the checks for if something should be in the dataProvidor inside the generator and only yield it if it should be tested. This might be especially valuable for the HEAD tests if they are unlikely to be covered by the API soon.

#29 @dmsnell
8 months ago

What would folks think of setting these as not being loaded and then run separately as is done with the HTTP remote and ajax tests. I think since these tests will never interact with the Database or work differently with multisite, we can also keep the matrix fairly small.

To me this seems like a perfectly reasonable solution. It would be great having them in the code so we don't have to constantly merge and revert merges on our dev boxes, plus having them there for anyone to use would be useful. Apart from relying on esc_attr() and _doing_it_wrong() the HTML API is also essentially independent code, so running in a separate runner is also fine.

put the checks for if something should be in the dataProvidor inside the generator and only yield it if it should be tested

Technically this should be possible, but it would imply running the tests at least twice. A hard part about this is that we don't know up-front whether a test should be run or not. Literally the determination is whether the HTML Processor gets through the input document without bailing because of unsupported markup. We could theoretically manually scan through the thousands of tests and skip ones we know can't run, but then we'd have to manually sort those skips tests every time we make a change to the API code and I'm sure we'd simply end up missing tests that need to be included.

The way we could do it is run the parser over the input in the data provider, and if it finishes then return it for the test proper, otherwise omit it. I can predict some valid critique of doing this, but it's definitely an approach we could take.

#30 @swissspidy
8 months ago

  • Type changed from enhancement to task (blessed)

This ticket was mentioned in โ€‹Slack in #core by chaion07. โ€‹View the logs.


7 months ago

#32 @costdev
7 months ago

What would folks think of setting these as not being loaded and then run separately as is done with the HTTP remote and ajax tests.

Yep, we can just add html-api to phpunit.groups.exclude like โ€‹the other excluded groups, and work out something for the GitHub CI jobs.

Even if the group is only run in certain circumstances, I also dropped โ€‹a comment on the PR which could, for the next few months as mentioned in an earlier comment, also silently skip certain datasets until further progress is made.

Overall, an excluded group and something akin to my comment on the PR could:

  • reduce the test results output
  • avoid a "scan through the thousands of tests and skip ones we know can't run", which no one wants to see happen
  • minimize "bloat" in the test/data provider methods

Alternatively, we could just replace the markTestSkipped() and markTestIncomplete() calls with expectNotToPerformAssertions() followed by return; for the timebeing.

A somewhat related changeset, that could serve as a precedent, mentions:

The @doesNotPerformAssertions annotation can be reconsidered in the future when the tests are either removed as unnecessary or updated to actually perform assertions related to their behavior.

(my emphasis)

In this case, the @doesNotPerformAssertions annotation isn't appropriate as we'd be conditionally marking a test as not performing assertions, so the method expectNotToPerformAssertions() makes more sense.

Last edited 7 months ago by costdev (previous) (diff)

โ€‹@jonsurrell commented on โ€‹PR #5794:


7 months ago
#33

Thanks for the review @costdev. I like the approach of extracting should_skip_test. I've pushed a change with that and we have the same number of assertions (70) with many fewer skipped tests (1104 -> 0):

Before:

OK, but incomplete, skipped, or risky tests!
Tests: 1609, Assertions: 70, Skipped: 1104, Incomplete: 435.

After:

OK, but incomplete, skipped, or risky tests!

Tests: 505, Assertions: 70, Incomplete: 435.

I've also switched the incomplete tests to be skipped.

#34 @jonsurrell
7 months ago

Excluding the html5lib-tests from "regular" test runs seems like a good plan to me.

Yep, we can just add html-api to phpunit.groups.exclude like the other excluded groups, and work out something for the GitHub CI jobs.

We don't want to exclude all of the html-api group tests so we'll probably need another group.

I've pushed a change to the PR that adds a group and excludes it in phpunit.xml. This seems to work well, the tests don't run by default but can be run by running phpunit with --group=html-api-html5lib-tests.

#35 @swissspidy
7 months ago

  • Milestone changed from 6.5 to 6.6

Moving to 6.6 since 6.5 was branched already.

#36 @jonsurrell
6 months ago

I've rebased the PR, I'd appreciate reviews or any additional feedback folks have.

#37 @dmsnell
6 months ago

I'd like to gather any remaining feedback within the next seven days and target merging this next week, probably Tues., April 16, or Wed., April 17.

If there are any strong objections please re-raise them here, and indicate if there's any way we could alleviate your concerns. Otherwise, because this has been so useful as an external auditor to ensure the validity of our HTML parsing, I'll try and merge it and address feedback as it arises.

โ€‹@costdev commented on โ€‹PR #5794:


6 months ago
#38

A couple of the test data files are being detected as binary by Git. The html5lib repo uses โ€‹this .gitattributes file to correct it.

To ensure the files are always human-readable whether locally or in PRs, can we add the following file to this PR?

tests/phpunit/data/html5lib-tests/.gitattributes

*.dat		-text diff

โ€‹@dmsnell commented on โ€‹PR #5794:


6 months ago
#39

Thanks @costdev - good catch!

#40 @dmsnell
6 months ago

  • Owner set to dmsnell
  • Resolution set to fixed
  • Status changed from new to closed

In 58010:

HTML API: Validate HTML Processor against external test suite from html5lib.

In this patch, the test suite from html5lib validates the tree-construction
steps in the HTML Processor to ensure that they are behaving according to the
HTML specification. This suite of tests is also used by the servo project to
test its html5ever package.

A new test module in the HTML API transforms HTML Processor output to match
the expected tree shape from the external tests. For cases where there are
tests validating behaviors of unsupported HTML tags and constructs, the tests
are marked as skipped. As the HTML API continues to expand its own support,
the number of skipped tests will automatically shrink down towards zero.

Additional tests are skipped through the SKIP_TEST array in the test runner.

Fixes #60227.
See #58517.
Props azaozz, costdev, dmsnell, hellofromtonya, jonsurrell, jorbin, swisspidy.

Note: See TracTickets for help on using tickets.