WordPress.org

Make WordPress Core

Opened 7 weeks ago

Last modified 7 weeks ago

#53119 new task (blessed)

Tests: introduce naming conventions for data providers and use named test cases

Reported by: jrf Owned by:
Milestone: Awaiting Review Priority: normal
Severity: normal Version:
Component: Build/Test Tools Keywords:
Focuses: docs Cc:

Description (last modified by jrf)

A number of tests use dataproviders - functions which return a multi-level array of test cases, where each sub-array is a set of parameters to pass to the actual test function -.

Also see: https://phpunit.readthedocs.io/en/9.5/writing-tests-for-phpunit.html?#data-providers

Currently there are:

  • No naming conventions for data providers - the functions are called whatever the author wanted to call them.
  • No placement conventions for data providers - sometimes the functions are at the top of a file, sometimes at the bottom, sometimes just before or after the test function using them.

I'd like to recommend streamlining this a little more to:

  1. Make it more obvious which functions are data providers.
  2. Make the data providers easier to find.

With that in mind, I'm proposing:

  1. That the function name of all data provider functions starts with data_, similar to how all test functions are prefixed with test_.
  2. That data providers when used by only one test are placed directly after that test.
  3. That data providers when used by multiple tests which directly follow each other, are placed directly after the last test using the data provider.
  4. That data providers used by multiple tests throughout a test file are placed at the bottom of the class. Updated: That when a data provider is used by multiple tests throughout a test file, that the tests using the data provider are grouped together and that the data provider is then placed directly after the last of the tests using it, as per 3.

Additionally, there is an awesome, but little known feature in PHPUnit which allows to "name" each test case in a data provider by adding an index key for each test data set.
This will make test failure information more descriptive and will make the test case which is causing the failure easier to find than having the default numeric indexes for test cases.

And when coupled with the --testdox feature, especially when selectively running a filtered set of tests will make it much more obvious what is being tested.

So I'd also like to recommend implementing the use of named test cases when using data providers.

To illustrate the difference:
Screenshot of a test run with a dataprovider and --testdox without using named test cases:
https://speakerdeck.com/jrf/my-top-10-phpunit-tips-and-tricks-e6ea54ce-2515-4ea9-aacf-9bf7ab3b3141?slide=26
Screenshot of a test run with a dataprovider and --testdox WITH named test cases:
https://speakerdeck.com/jrf/my-top-10-phpunit-tips-and-tricks-e6ea54ce-2515-4ea9-aacf-9bf7ab3b3141?slide=28

Code sample of how to implement this:
https://speakerdeck.com/jrf/my-top-10-phpunit-tips-and-tricks-e6ea54ce-2515-4ea9-aacf-9bf7ab3b3141?slide=27

Once this ticket has been actioned, I'd like to recommend that the conventions applied will be added to the Core contributors handbook.

/cc @hellofromTonya

Change History (13)

#1 follow-ups: @pbiron
7 weeks ago

As one who likes to write tests that use data providers I think this is a great idea!

I would, however, recommend one change to the proposal: put all data providers at the bottom of the class, regardless of which tests they are used for.

I think that simplification/consistency would be easier for contributors to "grasp".

#2 in reply to: ↑ 1 @hellofromTonya
7 weeks ago

Replying to pbiron:

I would, however, recommend one change to the proposal: put all data providers at the bottom of the class, regardless of which tests they are used for.

I think that simplification/consistency would be easier for contributors to "grasp".

IMO moving all the data providers to the bottom of the class can cause a "reading" and "understanding" disconnect between a test and its data. Why? The distance between the test and its data requires action to jump between the different locations in the class. This navigating "action" is a break in the reading which can cause cognitive effort and load.

Think of it in terms of reading a book. Imagine a chapter where the details of the story are moved to the end of the book rather than appearing in the next paragraph to support the flow of the story.

The test's data is the richness of the test. The test and its data are one unit.

For me, I find keeping the test and its data grouped together like this to flow better without the cognitive effort from jumping around:

test1
dataProvider1

test2
dataProvider2

test3
test4
dataProvider34

test5
test6

#3 follow-up: @pbiron
7 weeks ago

If any given data provider could only be used for 1 test, then I would completely agree that having the provider immediately follow it's test. But since the proposal recognizes that a given provider could be used for more than one test then it's not guaranteed that a test's provider will always immediately follow it, so folks will still have to "jump around" some times, which adds a certain amount of "cognitive effort" in figuring out "is this one of those times?".

Also, I can't remember if the following has come up in tests I've written/contributed to for core, but it has in test suites for plugins I've written:

  • At time T, I write a test and it's provider (in addition to some other tests that do not use this provider).
  • At time T + 1, I add a couple of other tests that are "related" to one another (after all the existing tests)
  • At time T + 2, I realize that the provider already in the class is appropriate for the "related" tests I added at time T + 1, so I remove those tests and add their "data" to the provider

Under the proposal as written, a contributor would have to remember to them move the time T + 1 tests to be after the time T, which can make for "messy" patches that require a high level of "cognitive effort" to realize exactly what is being changed.

So, for test suites for my own plugins, I adopted the convention to always put providers at the end of the class.

I'm not wedded to having that all data providers at the end, it's just a suggestion based on my own experience.

#4 in reply to: ↑ 3 @hellofromTonya
7 weeks ago

Replying to pbiron:

If any given data provider could only be used for 1 test, then I would completely agree that having the provider immediately follow it's test. But since the proposal recognizes that a given provider could be used for more than one test then it's not guaranteed that a test's provider will always immediately follow it, so folks will still have to "jump around" some times, which adds a certain amount of "cognitive effort" in figuring out "is this one of those times?".

In the case where a data provider is for multiple tests, the ticket proposes to group the test methods together with the data provider below the last test in the group. By doing so, the group of tests flow together with their data provider:

test1
test2
test3
dataProvider

test4
test5

#5 follow-up: @pbiron
7 weeks ago

I know...the point I was trying to make is the provider for test1 is not immediately following it. The provider is after test3, so a contributor would still have to "jump around" to find it's provider...just as they would for finding the provider for test3 if all providers were at the bottom of the class.

Again, I welcome the proposal as written, but personally would find it easier if I could know that all providers were at the bottom of the class.

#6 in reply to: ↑ 5 @hellofromTonya
7 weeks ago

Replying to pbiron:

I know...the point I was trying to make is the provider for test1 is not immediately following it. The provider is after test3, so a contributor would still have to "jump around" to find it's provider...just as they would for finding the provider for test3 if all providers were at the bottom of the class.

You're right. When there's a group of tests with a data provider (all grouped together), it does not eliminate needing to jump down from test1 to the data provider.

That said, (IMO) it does provide flow via grouping like tests together with their shared test data. That flow can aid in reducing cognitive complexity when reading the code.

Replying to pbiron:

Again, I welcome the proposal as written, but personally would find it easier if I could know that all providers were at the bottom of the class.

Thanks for sharing how you structure your tests.

Both the proposal and your way provide consistency when applied consistently. Both can aid in making the test suite more consistent. Consistency can help contributors to know how to structure tests. That's a win.

#7 follow-ups: @jrf
7 weeks ago

@pbiron I see your point. Having the data providers always at the bottom of the class makes it very predictable.

However, good test functions are short, a few lines at most, so even when the data provider would be for three tests, in most cases, having the data provider follow the tests it applies to, would still mean that both the tests as well as the (start of) the data provider function would fit onto one screen, making it easy to understand what the data sets mean as the parameter descriptions for the test functions are on that same screen.

This would rarely be the case when the data providers are at the bottom of the file, which implies that you'd always have to jump around the file to figure out what the various parts of each data set means, raising the cognitive complexity.

Now, you might say: "hang on, but there are a lot of test functions which aren't short"....
In that case, you would be correct, but I did say I was talking about good test functions.

As soon as a test function needs more than a few lines, it generally is a sign that the function under test is too complex and should be split up or even be a class instead of a function. So, complex tests in more a symptom of an underlying architectural problem and should not be guiding us in this.

Both the proposal and your way provide consistency when applied consistently. Both can aid in making the test suite more consistent. Consistency can help contributors to know how to structure tests. That's a win.

And on this I have to agree with @hellofromTonya: anything is better than the messy inconsistency we currently have... 🥴

Another thing which comes to my mind while reading though this discussion, is: what about test helper functions ? Should we set a convention for the placement of those ?

In my opinion, test helper functions should always be at the bottom of a test class (or in an abstract test case). What do you think ?

#8 in reply to: ↑ 1 @SergeyBiryukov
7 weeks ago

Replying to pbiron:

I would, however, recommend one change to the proposal: put all data providers at the bottom of the class, regardless of which tests they are used for.

I think that simplification/consistency would be easier for contributors to "grasp".

I see the point, but I have to agree with @hellofromTonya here. Personally, I find it much easier to grasp if the data provider directly follows the test it's used in, as proposed here. If the provider is used for multiple tests, it's still easier for me to see them grouped together and find the data provider after the last test in the group.

If there are multiple data providers in a class, having them all at the bottom of the class without a predictable order and having to jump back and forth while trying to keep the context would seem confusing to me.

#9 @jrf
7 weeks ago

  • Description modified (diff)

#10 in reply to: ↑ 7 @SergeyBiryukov
7 weeks ago

Replying to jrf:

Another thing which comes to my mind while reading though this discussion, is: what about test helper functions ? Should we set a convention for the placement of those ?

In my opinion, test helper functions should always be at the bottom of a test class (or in an abstract test case). What do you think ?

I think I would apply the same logic as for data providers here. If the helper function is only used in one or two tests, e.g. an add_filter() callback, I would expect to see it directly after the test to keep the context.

If the helper is used for multiple tests that do not necessarily belong together as a group, then I would expect to find the helper function after the ::setUp()/::tearDown() methods, or at the bottom.

Last edited 7 weeks ago by SergeyBiryukov (previous) (diff)

#11 in reply to: ↑ 7 @pbiron
7 weeks ago

Replying to jrf:

In my opinion, test helper functions should always be at the bottom of a test class (or in an abstract test case). What do you think ?

Whenever possible, I like them in an abstract test case class (e.g., WP_Test_XML_TestCase::loadXML()), which I wrote when XML Sitemaps were added to core. Note that that method contains an assertion, but it is basically a helper for other sitemaps tests (e.g., Test_WP_Sitemaps_Renderer::test_get_sitemap_index_xml_extra_elements()).

As @SergeyBiryukov mentioned, for things like add_{action|filter}() callbacks, near where they are used is my preference.

#12 @pbiron
7 weeks ago

One other thing that should be added to this proposal:

  1. for tests that use a data provider, always include @param tags in the test's DocBlock.

I'm not sure how many there are, but I know there are existing tests which use providers that do not do that (and I may have been responsible for a few of those).

#13 @jrf
7 weeks ago

  1. for tests that use a data provider, always include @param tags in the test's DocBlock.

I'm not sure how many there are, but I know there are existing tests which use providers that do not do that (and I may have been responsible for a few of those).

@pbiron Agreed. That's a given and is actually part of the WordPress Coding Standards, though Core doesn't run the WordPress-Docs standard yet over the code base. Can't say I blame them as the Docs standard still needs a lot of improving.

For the time being, we could very selectively just enable that specific error code (undocumented function parameters) for the Tests directory to safeguard this.

Note: See TracTickets for help on using tickets.