WordPress.org

Make WordPress Core

Opened 8 months ago

Last modified 5 weeks ago

#40180 assigned enhancement

Introduce `get_site_by()` function for multisite

Reported by: flixos90 Owned by: flixos90
Milestone: 5.0 Priority: normal
Severity: normal Version:
Component: Networks and Sites Keywords: has-patch has-unit-tests ms-roadmap needs-dev-note
Focuses: multisite Cc:

Description

As @stephdau mentioned on #40064, the new get_site() function from 4.6 is not a direct replacement for get_blog_details() which we are planning to deprecate.

Therefore I think it makes sense to introduce a function get_site_by() that is basically a wrapper for a WP_Site_Query (with LIMIT 1), and then runs array_shift() to only return the first result. This function would then become the direct replacement that people who previously relied on get_blog_details() with something other than an ID could use.

Parameter-wise, we should probably adjust it a little bit to match things like get_term_by(), which also makes the way that function works more explicit.

Attachments (6)

40180.diff (9.8 KB) - added by flixos90 8 months ago.
40180.2.diff (3.5 KB) - added by flixos90 3 months ago.
40180.3.diff (8.6 KB) - added by flixos90 3 months ago.
40180.4.diff (9.6 KB) - added by flixos90 7 weeks ago.
40180.5.diff (9.9 KB) - added by jeremyfelt 7 weeks ago.
40180.6.diff (11.2 KB) - added by jeremyfelt 7 weeks ago.

Download all attachments as: .zip

Change History (46)

#1 @stephdau
8 months ago

I'm liking the get_site_by() idea, from a processing standpoint.

I'm still not super fond of the idea of deprecating get_blog_details(), primarily because of the amount of person-hours that 3rd-parties will end up spending on moving to the new function, and users upgrading plugins/etc because of it (EG: I have 559 instances of the use of get_blog_details() outside of Core in the codebase I work on). But at least having a direct replacement means devs have more options (regex replacements, etc).

Keeping the full functionality around is ultimately what matters, so it's still a more viable option than the previous plan.

Thanks, @flixos90.

Last edited 8 months ago by stephdau (previous) (diff)

#2 @stephdau
8 months ago

There is one thing we'll need to contend with, for consistency, with other get_*_by() instances: get_user_by(), get_term_by(), etc all assume that the 1st parameter is a single $field, such as login (user) or slug (term).

In the case of get_blog_details(), that same equivalent parameter can accept an array of $fields, such as array( 'domain' => 'blah.com', 'path' => '/' );

#3 @stephdau
8 months ago

This will also have to be considered:

foreach ( get_site( 52723464 ) as $key => $value ) { echo "$key, "; }
blog_id, domain, path, site_id, registered, last_updated, public, archived, mature, spam, deleted, lang_id, ds_blog, ds_stats, 

foreach ( get_blog_details( 52723464 ) as $key => $value ) { echo "$key, "; }
blog_id, domain, path, site_id, registered, last_updated, public, archived, mature, spam, deleted, lang_id, ds_blog, ds_stats, blogname, siteurl, post_count, home, 

#4 @flixos90
8 months ago

@stephdau We can likely handle the parameters in the same way as a function like add_query_arg() works: You can either pass a key and a value as two separate parameters, or an array of key => value pairs (in that case the second parameter would be ignored). This makes the function flexible enough for these use-cases.

Thanks for bringing up the object iteration as well.

#5 @flixos90
8 months ago

  • Owner set to flixos90
  • Status changed from new to assigned

#6 @flixos90
8 months ago

In 40317:

Multisite: Add further unit tests for get_blog_details().

These tests verify that the returned site object is iterable and contains the expected properties.

See #40228, #40180.

@flixos90
8 months ago

#7 @flixos90
8 months ago

  • Keywords has-patch dev-feedback has-unit-tests added; 2nd-opinion removed

40180.diff is a possible implementation for get_site_by(), including unit tests. It accepts either a $field and its $value as parameters or a single array of fields and their values.

I decided to not make the function a one-to-one replacement for get_blog_details() as we're probably not going to deprecate that function. However, get_site_by() supports everything that get_blog_details() needs and does that in a much more efficient way, so I suggest we should use the new get_site_by() in get_blog_details() - all the latter needs to do then is tweak how the parameters are passed to it and modify the return value appropriately.

#8 @flixos90
8 months ago

I just added a patch on #40228 to demonstrate how this function could be used to improve get_blog_details() as well. My comment https://core.trac.wordpress.org/ticket/40228#comment:8 also goes a bit more into detail why I think that modifying get_blog_details() is not sufficient here and why it's worth introducing get_site_by().

This ticket was mentioned in Slack in #core-multisite by flixos90. View the logs.


7 months ago

This ticket was mentioned in Slack in #core-multisite by flixos90. View the logs.


7 months ago

This ticket was mentioned in Slack in #core-multisite by flixos90. View the logs.


7 months ago

This ticket was mentioned in Slack in #core-multisite by flixos90. View the logs.


7 months ago

This ticket was mentioned in Slack in #core-multisite by flixos90. View the logs.


7 months ago

#14 @flixos90
7 months ago

  • Milestone changed from 4.8 to Future Release

This needs more discussion and review. If we introduce such a function, it makes sense to bring it in alongside enhancements like #40364.

This ticket was mentioned in Slack in #core-multisite by flixos90. View the logs.


4 months ago

This ticket was mentioned in Slack in #core-multisite by jeremyfelt. View the logs.


3 months ago

#17 @flixos90
3 months ago

  • Keywords ms-roadmap added
  • Milestone changed from Future Release to 4.9

Per today's office hours, it was decided to introduce the new function and use it to improve get_blog_details().

Let's think about possibly finding a more suitable name. The function is not entirely equivalent to something like get_term_by(), since it also supports an array, so a different kind of name might make sense.

@flixos90
3 months ago

#18 @flixos90
3 months ago

40180.2.diff makes a few improvements:

  • It adds support for getting a site by only its path, if the current setup is a subdirectory install. Example: get_site_by( 'path', '/foo/' )
  • It introduces a new idea of simply passing a URL (domain-path combination), which is a lot quicker than passing an array of domain and path. Example: get_site_by( 'url', 'mysite.com/foo/' )
  • An additional tweak in the beginning of the functions ensures that get_site() is called as an easy shortcut when only an ID is passed.
  • Further unit tests have been added/adjusted.

I'm starting to like the url argument. Actually, that argument would allow us to completely get rid of the requirement of supporting an array, which would make the function much simpler. The only thing that people might wanna pass in addition is the network ID, but since that only applies to multi-network installs, it could as well be an extra third parameter, as it is in several other core functions. I will create another patch with that proposal.

@flixos90
3 months ago

#19 @flixos90
3 months ago

40180.3.diff implements a more straightforward version of get_site_by(), where it only accepts a single parameter (as mentioned in https://core.trac.wordpress.org/ticket/40180#comment:18). With this kind of functionality, it actually works similar like get_term_by(), thus the name now makes sense.

The idea is that the new url field is a combination of domain and path and thus covers that use-case in a simple way. Unit tests for the function are included.

I like this approach far better, and it clears up the naming issues we discussed on Tuesday. Furthermore get_site_by( 'url', 'mysite.com/foo/' ) is simpler than get_site_by( array( 'domain' => 'mysite.com', 'path' => '/foo/' ) ).

#20 @spacedmonkey
3 months ago

I really like 40180.3.diff. My problem with this function as been passing an array to it. It makes this function stick out compared to get_term_by. By the param url is a better work around for this.

Do we have a use for it core other than get_blog_details ? We replaced nearly all calls to the blogs table directly a while ago..

This ticket was mentioned in Slack in #core-multisite by flixos90. View the logs.


2 months ago

@flixos90
7 weeks ago

#22 @flixos90
7 weeks ago

  • Keywords needs-dev-note added

40180.4.diff fixes two issues with the previous patch:

  • If a domain starting with www. is given, the query is modified to be sorted by domain_length in DESC order: If you provide a www. URL and there actually is a www. and its variant without www. in the database, you don't want the non-prefixed site to be returned. See get_blog_details() where this currently happens to.
  • Paths given are now correctly parsed. In the previous patch it was possible for the path to end up as // which is invalid.

More tests have been added to verify the function works as expected.

This function is good to go in IMO, waiting for another review now. :)

This ticket was mentioned in Slack in #core-multisite by flixos90. View the logs.


7 weeks ago

#24 @flixos90
7 weeks ago

In addition to improving get_blog_details() (see #40228), get_site_by() can also be leveraged in the following functions:

  • get_id_from_blogname()
  • get_blog_id_from_url() (this will also get rid of the blog-id-cache cache group)
  • domain_exists()

These can be dealt with in separate tickets post-merge.

#25 follow-up: @jeremyfelt
7 weeks ago

Thanks for all the work on this, @flixos90, it's looking good!

  • I remember being a little wary of url as an argument, but it does seem to fit pretty well.
  • I'm a little unsure of the domain and path arguments as implemented. We don't have the same is_subdomain_install() restrictions in get_sites(). I would argue that if somebody is looking for protection like that, they should adjust based on their awareness of the environment rather than maybe go away thinking that a site doesn't exist. I may be biased because in the current form, I would not be able to use get_site_by( 'domain' ... ) in my configuration.
  • For the slug argument, I'm not sure we can assume that path will always be $network->path when is_subdomain_install(). I would go the other way and assume that we could leave path off entirely.

Everything else seems solid. I'll stare at the unit tests some more later as well.

#26 in reply to: ↑ 25 ; follow-up: @flixos90
7 weeks ago

Replying to jeremyfelt:

I'm a little unsure of the domain and path arguments as implemented. We don't have the same is_subdomain_install() restrictions in get_sites(). I would argue that if somebody is looking for protection like that, they should adjust based on their awareness of the environment rather than maybe go away thinking that a site doesn't exist. I may be biased because in the current form, I would not be able to use get_site_by( 'domain' ... ) in my configuration.

I think these restrictions make sense, as I see get_site_by() as a function that should generally only run a query which will produce a single result (even if it didn't have the limit argument). I don't think it should be a "here is some input, give me a random site for which this data applies" function. Using a full url will always fulfill this requirement (unless someone has similar www and non-www URL sites, which is edge-case though), using a slug will too on regular multisite setups. domain and path are basically more specific variants of slug, to be used when it's already clear that the setup is subdomain/subdirectory (get_blog_details() supports this only for subdomain installs for some reason, but it does support the general idea).
I may be a bit conservative here, but what I'm essentially saying that slug, domain and path only exist to account for the typical multisite (where no sub-site has an actually distinct non-subdomain domain name etc.). With anything custom, get_site_by() should mostly be used with url (and id which always works of course).

For the slug argument, I'm not sure we can assume that path will always be $network->path when is_subdomain_install(). I would go the other way and assume that we could leave path off entirely.

I took this bit of code from get_id_from_blogname() which works that way. I think it should stay that way since that is how a regular multisite network works. You either only use subdomains or subdirectories. Everything else is, I'd say, custom and the slug way alone cannot account for this.

@jeremyfelt
7 weeks ago

#27 in reply to: ↑ 26 ; follow-up: @jeremyfelt
7 weeks ago

Replying to flixos90:

I may be a bit conservative here, but what I'm essentially saying that slug, domain and path only exist to account for the typical multisite (where no sub-site has an actually distinct non-subdomain domain name etc.). With anything custom, get_site_by() should mostly be used with url (and id which always works of course).

Ok, that make sense. After poking around a bit more, I'm on the same page.

For the slug argument, I'm not sure we can assume that path will always be $network->path when is_subdomain_install(). I would go the other way and assume that we could leave path off entirely.

I took this bit of code from get_id_from_blogname() which works that way. I think it should stay that way since that is how a regular multisite network works. You either only use subdomains or subdirectories. Everything else is, I'd say, custom and the slug way alone cannot account for this.

Agreed. I did try setting a path before installing multisite and was reminded that you can't install in subdomain mode when that's the case, so if get_id_from_blogname() is doing it that way, then it may be useful for somebody and we should probably just let it be. :)

I tweaked the tests in 40180.5.diff:

  • Removed the assertions that the returns were WP_Site objects. We should be asserting one thing per test, and if it's not returning the object, then other things are going to fail.
  • Added 2 explicit tests for retrieving a site by slug on specific networks. I see that the other tests use different networks, but I think having the explicit behavior tested is good.

The tests for get_site_by( 'id' ...) gave me pause. I definitely don't think we need to account for somebody passing null as the ID value when they can just use get_site(). I think that test can be removed.

As for id itself, are we doing this just for parity with other functions? I can't imagine this being more useful than get_site( $id ).

#28 in reply to: ↑ 27 ; follow-up: @flixos90
7 weeks ago

Replying to jeremyfelt:

Removed the assertions that the returns were WP_Site objects. We should be asserting one thing per test, and if it's not returning the object, then other things are going to fail.

The reason I put these extra assertions in was that I wanted to have failures rather than errors (if get_site_by() returns null which I'd consider a test failure, it throws an error without the WP_Site assertion). But I see the one assertion per test argument, so I agree with that as well.

  • Added 2 explicit tests for retrieving a site by slug on specific networks. I see that the other tests use different networks, but I think having the explicit behavior tested is good.

Good catch, that makes sense.

The tests for get_site_by( 'id' ...) gave me pause. I definitely don't think we need to account for somebody passing null as the ID value when they can just use get_site(). I think that test can be removed.

As for id itself, are we doing this just for parity with other functions? I can't imagine this being more useful than get_site( $id ).

Right, I agree there's probably nobody who will use get_site_by( 'id', ... ) over get_site( ... ), but I did it for parity which I still think makes sense.
As far as putting null goes, I agree that this test can be removed. While it would still work because it just calls get_site(), trying to get a site by a specific ID, which the function name implies, and then passing null doesn't make sense. Should we actually catch that behavior and return null in such a case?

#29 in reply to: ↑ 28 @jeremyfelt
7 weeks ago

Replying to flixos90:

As far as putting null goes, I agree that this test can be removed. While it would still work because it just calls get_site(), trying to get a site by a specific ID, which the function name implies, and then passing null doesn't make sense. Should we actually catch that behavior and return null in such a case?

That's tough. We probably aren't consistent in core with how much protection we offer. The function signature implies through documentation that string|int is expected, which I'd expect to be enough. We could do something similar to get_term_by() and check for a string length greater than 0 if slug, domain, path, or url and for a positive int if id.

#30 @spacedmonkey
7 weeks ago

In 40180.5.diff get by id, calls get_site. But what does the network_id mean in this context. It is completely ignored. The network id means, get a site in this network. I think we shouldn't exist early here, but just pass site__in arg to get_sites.

This ticket was mentioned in Slack in #core-multisite by spacedmonkey. View the logs.


7 weeks ago

@jeremyfelt
7 weeks ago

#32 @jeremyfelt
7 weeks ago

  • Owner changed from flixos90 to jeremyfelt
  • Status changed from assigned to reviewing

In 40180.6.diff:

  • Ensure a non-empty string is passed for string fields. Previously, if an empty string was passed for something like domain, get_sites() would return the results of a default query (get_sites( array( 'domain' => '' ) );).
  • Add tests for empty string values.
  • Ensure a numeric value is passed when id is used to avoid an unnecessary query attempt.
  • Use the site__in parameter when id is used as a parameter so that the network_id parameter continues to work as expected.

#33 @jeremyfelt
7 weeks ago

  • Resolution set to fixed
  • Status changed from reviewing to closed

In 41698:

Multisite: Introduce get_site_by().

get_site_by() is a replacement for get_blog_details() that uses WP_Site_Query to retrieve specific sites based on a given field and value.

Props flixos90, spacedmonkey.
Fixes #40180.

This ticket was mentioned in Slack in #core-multisite by flixos90. View the logs.


7 weeks ago

#36 @flixos90
5 weeks ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

Due to issues that came up in https://core.trac.wordpress.org/ticket/40228#comment:32, we revisited the way caching in get_site_by() currently works.

We discussed the topic in detail today and came to the conclusion that get_site_by() should be reverted for now and rescheduled for 5.0: We need to generally work on improving caching for WP_Site_Query (and WP_Network_Query) to use special caches for common use-cases, which can be invalidated less aggressively (currently any change to any site/network invalidates every site/network query cache). We should furthermore consider storing not found values for the sites and networks caches too.

To be clear: get_site_by() itself still is probably ready, but WP_Site_Query that it uses needs to improve how it caches results for the function to be really beneficial.

#37 @flixos90
5 weeks ago

  • Owner changed from jeremyfelt to flixos90
  • Status changed from reopened to assigned

#38 @flixos90
5 weeks ago

In 41883:

Multisite: Revert [41719].

While get_site_by() makes sense as a more explicit and less complex replacement for get_blog_details(), it is not ready yet in terms of caching, where it currently falls short of the older function under specific circumstances.

See #40180, #40228.

#39 @flixos90
5 weeks ago

In 41884:

Multisite: Revert [41698] and [41743].

In order for get_site_by() to be truly beneficial, caching in WP_Site_Query needs to be improved to account for common use-cases and have them be invalidated less aggressively.

See #40180, #40228, #42091.

#40 @flixos90
5 weeks ago

  • Keywords dev-feedback removed
  • Milestone changed from 4.9 to 5.0
Note: See TracTickets for help on using tickets.