Opened 11 years ago
Closed 11 years ago
#27003 closed task (blessed) (fixed)
Introduce wp_get_network() and begin cleanup of multisite load
Reported by: | jeremyfelt | Owned by: | |
---|---|---|---|
Milestone: | 3.9 | Priority: | normal |
Severity: | normal | Version: | 3.9 |
Component: | Bootstrap/Load | Keywords: | needs-testing |
Focuses: | multisite | Cc: |
Description
During the multisite load process, we use wpmu_current_site()
to process the requested URL into a matching network. The access to this function is marked as private, and it really does perform the one task—setting the $current_site
global.
It wouldn't hurt to have a network retrieval function separate from this load process. Secondarily, the logic in wpmu_current_site()
could use some cleanup.
The attached patch introduces wp_get_network()
and reduces some of the cruft that is in wpmu_current_site()
.
Some notes:
- I dropped the
$sites = query for sites
that we use to check if only one network has been installed. I'm not convinced this saves any time, and from what I can tell we don't use the$sites
global for anything else in core. This would clear out #26942 - Unfortunately,
wp_get_network()
must remain inms-load.php
for now, asms-functions.php
is not available yet. Not sure if this is something we should think about reorganizing. - I have not had a chance to test this with multiple networks yet. A multisite, single network appears to work fine. More testing and feedback is very much necessary.
Attachments (25)
Change History (80)
#3
in reply to:
↑ 2
;
follow-up:
↓ 4
@
11 years ago
Related: #25316 (to this ticket, not the comment)
Replying to Denis-de-Bernardy:
Would it be desirable (in a separate ticket, of course) to make it so that adding the relevant multisite defines in advance, when installing WP, would make the installer just work and set things up with multisite already up and running?
I thought there was a ticket for this, but I can't find it now. I'd be open to cleaning that up.
#4
in reply to:
↑ 3
@
11 years ago
Replying to jeremyfelt:
I thought there was a ticket for this, but I can't find it now. I'd be open to cleaning that up.
I've created #27012.
This ticket was mentioned in IRC in #wordpress-dev by jeremyfelt. View the logs.
11 years ago
#6
@
11 years ago
Ok. 27003.2.diff takes another approach via get_network_by_path()
. This function accepts only $domain
and $path
as arguments and walks through some possibilities in an attempt to find a matching network.
On a request for http://foo.bar.network.com/path/page-title/
, the matching occurs like this:
foo.bar.network.com/path/
foo.bar.network.com/
bar.network.com/path/
bar.network.com/
network.com/path/
network.com/
If no matching network is found from those possibilities, false is returned. I've tested this a bit in a multinetwork setup and it seems to be doing the trick thus far. I got rid of my sunrise.php
and have been flipping between networks. My favorite part is that it ditches is_subdomain_install()
.
#7
@
11 years ago
Yes please. Let's unit test this function (I can write some tomorrow if you think fresh eyes will help) and drop this in.
This ticket was mentioned in IRC in #wordpress-dev by nacin. View the logs.
11 years ago
#9
@
11 years ago
27003.8.diff adds another foreach()
to walk through the possible domains from the originally passed domain. We were missing foo.bar.network.com
when bar.network.com
was the network.
I'll take a look at updating tests to match this scenario as well.
#10
@
11 years ago
- Milestone changed from Awaiting Review to 3.9
27003.9.diff adds additional assertions for subdomains.
site1.wordpress.net/
should match thewordpress.net/
domainsite1.wordpress.net/four/
should match thewordpress.net/four/
domain
This ticket was mentioned in IRC in #wordpress-dev by jeremyfelt. View the logs.
11 years ago
#13
follow-up:
↓ 14
@
11 years ago
27003-docs-fix.patch corrects inline docs param types and name.
Then, in this line, we allow $networks
to be an empty array (happens, in theory, if there are no rows in $wpdb->site
), but in the next line we access $networks[0]
. I'm not sure how likely that is to happen in practice, but we should probably avoid it to not trigger a notice.
#14
in reply to:
↑ 13
@
11 years ago
Replying to TobiasBg:
(happens, in theory, if there are no rows in
$wpdb->site
)
Would it make sense to use another wp_die()
here if the count is 0 to explain that no networks are setup?
@
11 years ago
Adds hypothetical filters for short-circuiting, for governing how many path segments to consider, etc.
#15
@
11 years ago
I like where 27003.13.diff is going. To sum up:
- If the network and site is defined, use it.
- If additional sites are paths of a primary network domain, lookup the network before finding the site.
- If additional sites are different domains (or subdomains), lookup the site before finding the network. (The site tells us what its parent network is)
So much cleaner.
get_site_by_path()
and the changes to get_network_by_path()
look great so far. I'm still parsing in detail. Seems it would be fairly straightforward to add caching to each of these based on the requested args. The filters are going to go a ways toward reducing the use of sunrise entirely for domain lookups.
I wonder if there's a place for a "every thing you're looking for" cache key after step 1 and before steps 2/3 that is able to retrieve site and network information based on the requested domain/path or if the processing required to extract the requested domain/path makes it not worth the effort.
The patch also addresses the no networks issue raised by @TobiasBg with a ms_not_installed()
if 0 rows are found.
#16
@
11 years ago
Yeah, so, 27003.13.diff is a pretty sharp departure from how multisite site detection works now, which starts with identifying the network. The patch tries hard to find the site first, then use the site's network ID to set the current network. To find a site, it takes the domain plus the first path segment. Now, that doesn't always work:
- If we're dealing with a "subdirectory" install, our network may have a path. (A "subdomain" install, which really is an "anything goes" install, ideally does not have a network path.) So we need to identify the network first, then use the first path segment following the network's path segment to identify the site.
- If constants are defined for the network (as they are per instructions), then we can simply use those first. Note the patch doesn't currently then restrict the resulting site to that network. In fact in 3.8 if the located site isn't on the declared network, then it resets the network object anyway. (There was actually no reason to find the network first in 3.8.) I've updated that in 27003.14.diff, otherwise the same as 13.diff.
It also tries to optimize for some common situations. Some notes:
- If it's a "subdirectory" situation where constants are not defined, it queries the DB to see if there are even two networks installed. If there's just one, then that's obviously our network, and it caches it (just as it did in 3.8). While constants are going to be in place almost always, when they're not, jeremyfelt and I had agreed in IRC that it's going to be more common for a single network to be used.
- If you're running an external object cache, it caches whether any network has a path. If none do, then it can bypass querying the networks table for paths all together, which results in a much cleaner (and much more easily cached) query.
- It's probable that these should be network transients.
- Filters are in place to not only short-circuit the site-finding code all together, but to also specifically declare how many path segments are in play. So an install could say that domain X actually has sites for two path segments, while domain Y has none. If we're dealing with a "subdomain" only install, then trying to query for every single path is a terrible query. To avoid that, a future patch should use the external object cache to see if any sites at all have a path. This is more or less necessary to maintain the same performance of subdomains pre-patch.
- It requires additional caching, and needs a review from pento for how to best make these queries. I went for fewest queries possible when in reality it might make more sense to add a few more queries but be sure they are being cached to a T.
- Because this code opens up the possibility that *any* initial path segment is actually a site, this can cause some performance issues. At least with subdomains you have DNS to prevent essentially random strings in a URL from causing a boatload of queries — and, yes, even flooding memcache. (If using a wildcard DNS, then the whole redirect-to-signup kicks in.) These queries are not the absolute cheapest and haven't been tested against a massive blogs table, yet.
There are likely further optimizations to make (not to mention a number of declared @todo's), there are likely some assumptions made that will need to be challenged (for example — can a subdomain install have a network path, as it exists now?), and it needs a lot of unit tests. Also, the *_by_path_segments_count
filter names aren't great. But it's a great step forward.
#17
@
11 years ago
At the time of these new filters, plugin.php and functions.php are loaded — and little else. jeremyfelt points out that these filters mean you won't need to do much domain/path/network/site manipulation in sunrise anymore, but you'll need to use a sunrise file to actually register your plugin callbacks. I find that to be OK. It ends up being much easier in some respects.
Say make.wordpress.org was our network and /core/ was our site, and then we wanted to add a /core/handbook/ site. (I had to do this at one point via a nice hack; though they're merged now.) It'd be easy for me to use the site_by_path_segments_count filter to change the 1 segment to a 2 when the requested path starts with /core, or more so /core/handbook. Or I could use the pre_* filter to actually return a site object (we'll need a wp_get_site() to match wp_get_network() and to replace basic usage of get_blog_details()) for exact matches.
Of course, you could also just do $current_site = wp_get_network( X ) and $current_blog = wp_get_site( X ). It really depends what kind of lifting you want to do and where you want it done. If you had a second level of path segments you wanted considered on a dynamic basis (so not simply "handbook") then passing this off to find_site_by_path() will make your life easier.
And then there's the question: why limit it to just one path segment by default? Mainly because it's so dang hard to predict and optimize for all of the setups out there. You might have a network with thousands of sites, all with a path of / (so, domains or subdomains), yet you'll need to take the entire URL (/2012/01/05/something/page/2/feed/whatever/) and check to see if any of those path segments add up to a site. That means a unique query for every single URL hitting the install, which is a nightmare to cache.
Now, you could query for all sites on that domain, cache those results, and then simply see if any of the paths match up, but how do I know if it's one site with a path of /, or if *every* one of those thousands of sites are on that particular domain?
There are only so many "large" networks out there (networks with > 10,000 sites) that would run into performance issues; it's possible we could have different optimization "tracks" depending on the kind and size of install you're running. That's kind of what the proposed filters start to get at, by letting you control the path segments.
I do think we can get smarter here, and these patches make some headway, but it's not straightforward.
#18
@
11 years ago
Some thoughts on function naming:
wp_get_network() without any arguments could replace get_current_site(). It would be lovely to ditch "$current_site" absolutely everywhere. Even in this file; we could possibly make it a reference to a new variable.
wp_get_site() is probably a bit too broadly named. Though we don't use it all that much, the "ms" prefix here could help. but, this function isn't yet a must-have. We could wait for some cleanups and deprecations for when "site" actually refers to "network". We already have is_main_site() but also get_current_blog_id()... we're getting there.
#19
@
11 years ago
We should take #26403 into consideration while determining the requested path. I don't think a strtolower()
would hurt anything here.
This ticket was mentioned in IRC in #wordpress-dev by jeremyfelt. View the logs.
11 years ago
#21
follow-up:
↓ 22
@
11 years ago
27003.15.diff fixes the barrage of debug notices currently fired off from trunk when multisite is enabled and the possibility of multiple networks exists.
#22
in reply to:
↑ 21
@
11 years ago
Replying to johnjamesjacoby:
27003.15.diff fixes the barrage of debug notices currently fired off from trunk when multisite is enabled and the possibility of multiple networks exists.
This is needed against current trunk. The changes introduced in 13.diff and 14.diff will effectively deprecate wpmu_current_site()
if/when merged.
#23
@
11 years ago
27003.16.diff builds on Nacin's 27003.14.diff and adds caching for requests made to get_network_by_path()
. Keys are an md5 hash of the requested domain and path, along the same lines of the key creation in WP_Theme
.
$request_cache_hash = md5( $domain . '/' . $path ); $network_id = wp_cache_get( 'network_' . $request_cache_hash, 'networks' );
I'm not sure where invalidation would be needed here. This is a cache of a specific request and for it to become invalid, the network ID matched with the domain and path would have to change. This doesn't seem too common, though we can clear or reset the cache value way down the road when some sort of network management appears in the admin. Maybe?
#24
follow-up:
↓ 25
@
11 years ago
I think everything (even the old stuff) should actually be wp_cache_add(). There's no reason for an unconditional set. If it already exists somewhere, we should bail. (set() forces a cache update, even across multiple servers, which isn't necessary when we are simply caching a value that wasn't already cached, versus updating an invalid value.)
While a 'networks' bucket is good to think about, it's not a declared global bucket. Let's keep using site-options to keep it clean.
The one problem with this is it'll result in a new cache key for every unique URI request. That's a lot of cache writes. The second thing to think about is that get_network_by_path() is actually the cheaper function. The networks table will without a doubt smaller than the blogs table, and it's often sped up or bypassed via a constant, or there being only one network (this is a cached state, though populate_network() should clear that cache), or by their being no networks with paths. The function is also only used in one situation, and only because we're only searching one path level (past the network's path, which is why we need it). Caching the get_site_by_path() request is going to be more useful.
If networks_have_paths is false, then we only need the domain in the cache key. Or even better, this is when networks_have_paths comes in useful. Now that I think about it, if no networks have a path, then we don't actually need to run get_network_by_path(), and can instead go right to get_site_by_path().
We can actually leverage networks_have_paths also outside of get_network_by_path() as a way to replace the is_subdomain_install() check. As it is now, if you're running a "subdomain" install and add a network with a path, you'd have to add your own handling (well, adjust one of the new filters) to make sure any deeper-pathed sites off that network are handled. Instead, we can simply use networks_have_paths as a barometer. No current subdomain install would be adversely affected, as none of them have a network with a path. Presumably.
Not going to lie, I kind of just want to load all networks into memory and bypass that whole mess.
#25
in reply to:
↑ 24
@
11 years ago
Replying to nacin:
The one problem with this is it'll result in a new cache key for every unique URI request.
Not going to lie, I kind of just want to load all networks into memory and bypass that whole mess.
Good point(s). There's something in this combo. I was looking at adding caching to wp_get_network()
and the path stuff was making less sense. My guess is that the number of multi network installs where the number of networks is larger than 10, 20, 50 (?) is very low. Storing that into memory with the ability to handle larger setups via sunrise makes sense.
#26
@
11 years ago
27003.14.diff is having trouble with an initial single network install with one site.
src.wordpress-develop.dev
loads fine, as doessrc.wordpress-develop.dev/wp-admin/
src.wordpress-develop.dev/2014/01/07/hello-world/
initiates aget_site_path()
call with a segment count of 2, only passing/2014/01/
and/2014/
as arguments to the site query. The/
path is not found and results inms_not_installed()
.
#27
follow-up:
↓ 28
@
11 years ago
Can we get 27003.15.diff committed while we think this through, so trunk stops throwing notices?
#28
in reply to:
↑ 27
@
11 years ago
Related #21143 - a hook for when no site is found.
Replying to johnjamesjacoby:
Can we get 27003.15.diff committed while we think this through, so trunk stops throwing notices?
+1
#30
@
11 years ago
Not going to lie, I kind of just want to load all networks into memory and bypass that whole mess.
Considering that having a huge number of networks is rare, this might be a good optimization.
#31
@
11 years ago
Things before commit:
- When testing this on WordPress.org I found a few issues related to the use of get_site_option() prior to setting $wpdb->siteid, so site_id = 0 was getting queried. So we're now going to pass the network ID to wp_load_core_site_options() and wait to set the site_name property until after $wpdb is init'd.
- If path segments are > 2, only the longest two are used. There needs to be a loop to query /a/b/c/d/, , /a/b/c/, /a/b/, /a/, and /.
- get_site_by_path() needs unit tests.
Hope to commit this in the next few hours or so, once I'm stationary and can fix inevitable WP.org breakage.
#33
@
11 years ago
- Keywords has-patch dev-feedback removed
- Type changed from enhancement to task (blessed)
Leaving this to jeremyfelt for any additional tweaks or cleanups.
#34
@
11 years ago
- Keywords has-patch added
27003-docs-cleanup.patch is a refresh of 27003-docs-fix.patch plus some white space cleanup.
#36
follow-up:
↓ 37
@
11 years ago
- Keywords has-patch removed
General caveat...
The default path segments number for both networks and sites is 1. If you have a network of foo.bar/network1/
and a site of foo.bar/network1/site1/
, use of the site path segments filter is required. If 2 is not specified via filter, then get_site_by_path()
will try to match network1
for your site rather than /network1/site1/
.
Some extra web server config for wp-* to work and likely a sanity check is required as well.
We *could* think about automatically upping the site path count if networks have paths, but this seems like something to let lie for a bit.
#37
in reply to:
↑ 36
;
follow-up:
↓ 41
@
11 years ago
Replying to jeremyfelt:
The default path segments number for both networks and sites is 1. ... We *could* think about automatically upping the site path count if networks have paths, but this seems like something to let lie for a bit.
Actually, the default is all paths, but the functions receive 1 in most cases. If a network has a path (which only happens for "subdirectory" installs), then get_network_by_path() is run first with a segment number of 1. Then get_site_by_path() is run with a segment number of 1 + however many segments are in the network's path. Look for the is_subdomain_install() call in [27359]. So, your caveat was accounted for. :-)
#38
in reply to:
↑ 32
@
11 years ago
Replying to nacin:
In 27359:
This commit introduced a bug in wp-includes/ms-settings.php
:
if ( ! $current_site = wp_cache_get( 'current_network', 'site-options' ) ) { // Are there even two networks installed? $one_network = $wpdb->get_row( "SELECT * FROM $wpdb->site LIMIT 2" ); // [sic] if ( 1 === $wpdb->num_rows ) { $current_site = wp_get_network( $one_network ); wp_cache_set( 'current_network', 'site-options' ); } elseif ( 0 === $wpdb->num_rows ) { ms_not_installed(); } }
The call to wp_cache_set
on line 78 is missing the $data
argument.
It should be:
wp_cache_set( 'current_network', $current_site, 'site-options' );
#39
@
11 years ago
Via @gradyetc's comment, 27003.19.diff adds the missing data to wp_cache_set()
.
#40
@
11 years ago
27003.20.diff deprecates get_current_site_name()
now that it is no longer used (and is private). wp_get_network()
provides some replacement functionality, though we may be able to get by without specifying a replacement.
#41
in reply to:
↑ 37
@
11 years ago
Replying to nacin:
Replying to jeremyfelt:
The default path segments number for both networks and sites is 1. ... We *could* think about automatically upping the site path count if networks have paths, but this seems like something to let lie for a bit.
Actually, the default is all paths, but the functions receive 1 in most cases. If a network has a path (which only happens for "subdirectory" installs), then get_network_by_path() is run first with a segment number of 1.
Ahh, I was mixing subdomains and subdirectories. Caveat Empty. :)
#42
@
11 years ago
In 27003.21.diff, if we have found a site and a network and the site's domain differs from the network domain, use the site domain for cookies.
This is a little ugly because we're setting the network's cookie domain with site information, but that's how we set COOKIE_DOMAIN
.
27003.19.diff and 27003.20.diff are separate from this.
#44
follow-up:
↓ 46
@
11 years ago
Deprecating something in load.php or ms-load.php is tough because these files are available earlier than deprecated.php or ms-deprecated.php would be. If someone is calling get_current_site_name() in sunrise.php this would turn into a fatal error.
get_current_site_name() is for populating the object with the site_name property. While it accepts a $current_site, it is before $wpdb->siteid is actually set, which means get_site_option() doesn't actually work as a replacement. That's why it does these weird direct DB queries and cache calls, and that's why I side-stepped it and chose to set site_name after the fact.
If wp_get_network() returned a WP_Network object we could lazy-load site_name and it wouldn't be a problem. But we have no proper way to "switch" to a network so it's not like we could make get_site_option() do what we wanted it to do.
get_current_site()->site_name is used in a decent number of places. However, the changes I referenced earlier also mean that site_name will be set even if sunrise had set $current_site and $current_blog (it got pulled out of the conditional), thus it doesn't matter if it doesn't do anything.
Seems like our default handling of cookie domains was always such that we assumed the current network would completely encompass all domains on that respective network. That, obviously, is wrong. I imagine if a site doesn't match a network now, with no special handling (a plugin, etc.), COOKIE_DOMAIN would be set to the network, and the site wouldn't have functioning cookies? If so, then we could aim to think outside the box a bit here.
#46
in reply to:
↑ 44
@
11 years ago
Replying to nacin:
I imagine if a site doesn't match a network now, with no special handling (a plugin, etc.), COOKIE_DOMAIN would be set to the network, and the site wouldn't have functioning cookies?
Exactly. 27003.21.diff (very weak) forced the site domain rather than the network domain, but that only works when the paths on each are the same.
We need to determine the best way to handle COOKIE_DOMAIN
, COOKIEPATH
, SITECOOKIEPATH
, and ADMIN_COOKIE_PATH
, all which are set in ms_cookie_constants()
if not previously set elsewhere.
One option is to continue with each page load resulting in one set of cookies. This would be relatively straight forward and requires that ms_cookie_constants()
have access to current site and network information. Cookie constants would then be set based on the site that loaded, not the network.
Another option is to consider cases when setting cookies for multiple domains might be desirable. Though a user with access to one site doesn't necessarily need (or want) cookies for the network's primary admin site. This is probably best left to developers with sunrise or elsewhere. We could add filters in ms_cookie_constant()
to provide a nice way in.
Also possible alongside either approach—wrapping setcookie()
.
#47
@
11 years ago
Some thoughts.
- Should a requested path be considered case sensitive for finding site/network? #26403 has a
strtolower()
patch. - Some sort of
site_not_found
hook (see #21143) can be useful as a redirect opportunity. - Some sort of hook in
ms_not_installed()
may also be useful. This is less of a redirect opportunity and more of an "oh, crap. now what." opportunity. ms_not_installed()
docs are misleading. We mostly call on it when a network cannot be found, but the errors seem centered around missing sites. The errors are also (purposely?) vague. We could probably be a bit more explicit about what has happened whenis_admin()
. This is probably worth a new ticket for 4.0.- I think the current cookie domain stuff works for current configurations. Everything can already be (and is being) overridden in sunrise. It is worth looking at in 4.0 to try and support more arbitrary domains and paths.
27003.22.diff moves the site/network not found block up a bit and joins it with the empty $current_site
check. This can all happen before the cookie domain is set and before we try to find a network's main site.
The main goal for this patch is to add documentation to some of the redirect behavior and to identify where we can hook in for customization.
#49
follow-up:
↓ 54
@
11 years ago
- In
get_network_by_path()
, we usewp_get_network()
to retrieve the same object we just retrieved. It may be worth dropping that additional call for now until we know it's needed. - Actions for
found_site_by_path
andfound_network_by_path
when sites and networks are being returned by their respective functions could be useful as a way for sunrise to hook in and set custom caching (or other functionality) for repeat requests. - nacin made mention of using
wp_cache_add()
in a previous comment. We're still usingwp_cache_set()
during the bootstrap in ms-settings.
#50
@
11 years ago
- 27003.23.diff caches
$current_site->blog_id
when found the first time under the'network_' . $current_site->id . '_blog_id'
key. - 27003.22.diff is still on the table.
- Thinking through a filter for setting the cookie domain to allow sunrise to hook in there as well in advance of any future cookie handling that we do.
#54
in reply to:
↑ 49
@
11 years ago
I'm closing this out. New tickets for any other issues. Great work, jeremyfelt, I'm loving this. :-)
Replying to jeremyfelt:
- In
get_network_by_path()
, we usewp_get_network()
to retrieve the same object we just retrieved. It may be worth dropping that additional call for now until we know it's needed.
I figure that wp_get_network() will eventually start returning a proper object, so I wanted to go ahead with this abstraction now. Also, originally, there was some decoration that occurred in wp_get_network(). It doesn't matter for the moment. That said, emphasizing that a DB row should be passed through to wp_get_network() sets a good standard for others to do so, which means the migration to a proper object might be smoother.
- Actions for
found_site_by_path
andfound_network_by_path
when sites and networks are being returned by their respective functions could be useful as a way for sunrise to hook in and set custom caching (or other functionality) for repeat requests.
Seems OK; I imagine you're referring to then leveraging that information in a pre_* hook. Let's spin off a separate ticket for this, though.
- Thinking through a filter for setting the cookie domain to allow sunrise to hook in there as well in advance of any future cookie handling that we do.
Since cookie handling is a separate can of worms, let's also make that a separate ticket.
Slightly off topic, but since the ticket mentions "cleanup of multisite load" in its title: there's a lot of code in wp-includes/ms-*.php that runs very early and assumes that WP is already installed. When it isn't and you're hoping to get redirected to wp-admin/install.php, it ultimately fails with an extremely misleading database connection error.
Would it be desirable (in a separate ticket, of course) to make it so that adding the relevant multisite defines in advance, when installing WP, would make the installer just work and set things up with multisite already up and running?
Desirable or not, I raised the question here because a few
isset()
calls here and there are currently needed to avoid E_NOTICE errors when loadingwp-admin/install.php
with these defines around, so might as well add them in advance during the cleanup.