WordPress.org

Make WordPress Core

Opened 12 months ago

Last modified 2 days ago

#37923 assigned feature request

Introduce shared wp_blogmeta database table for multisite installation

Reported by: johnjamesjacoby Owned by: spacedmonkey
Milestone: 4.9 Priority: normal
Severity: normal Version:
Component: Networks and Sites Keywords: has-patch has-unit-tests needs-testing ms-roadmap
Focuses: multisite Cc:

Description

In a multisite installation, it would be nice to be able to query for sites using some kind of other arbitrary method. This currently is not possible because each site has it's own _options database table. (This is totally fine, and makes sense considering the huge number of options each site will usually have.)

However, with the introduction of WP_Site_Query and now having the ability to query for sites in a multitude of ways, a dedicated blogmeta database table would pair nicely with the WP_Site object.

This may not be necessary for core right away, but eventually, some multisite-specific options could get moved here to avoid polluting the per-site options table. For example, the new_user_ options when inviting existing users to an existing site, are a multisite specific option that is auto-loaded, causing a potentially costly recalculation of alloptions just to invite a user to your site.

A few other neat potential core usages off-the-top-of-my-head:

  • Store for the total number of users in each site, and query by most popular
  • Store for the total number of published posts in each site, and query by most active
  • Store the Blog Name, and query for sites alphabetically
  • Store a site_owner user ID, and query for sites the user "owns" relatively efficiently (compared to the LIKE query on usermeta)

Attachments (6)

sitemeta.php (2.7 KB) - added by spacedmonkey 6 months ago.
sitemeta1.php (3.6 KB) - added by spacedmonkey 6 months ago.
37923-phase1.diff (39.9 KB) - added by flixos90 5 months ago.
37923.diff (35.1 KB) - added by flixos90 5 months ago.
37923.2.diff (19.8 KB) - added by spacedmonkey 3 weeks ago.
37923.3.diff (25.9 KB) - added by spacedmonkey 3 weeks ago.

Download all attachments as: .zip

Change History (50)

#1 @johnjamesjacoby
12 months ago

A first-pass proof of concept on GitHub: https://github.com/stuttter/wp-blog-meta

This ticket was mentioned in Slack in #core-multisite by jeremyfelt. View the logs.


7 months ago

#3 @iamfriendly
7 months ago

A list of all options created, by default, (4.7.1) when a new site is added to a network. Additionally, suggested candidates for what might move from wp_#_options to wp_blogmeta are indented and in bold.

https://docs.google.com/document/d/1bMajucj7vVLP2xFIoTC5BdOTzF5jEfnYcrxrVEdVH5I/edit?usp=sharing

Namely:

admin_email - getting a list of all site's default admins seems useful (we do it regularly, for example)
blog_public - finding out which sites are public more easily seems like something that would speed up other queries for the REST API for example
blogdescription - seems to go in parallel with other info currently in my-sites
blogname - used in my-sites
db_version - I could be dissuaded on this, but seems useful to check if all sites on a network are up-to-date
Edit: Ignore. finished_splitting_shared_terms - same logic as above, can be dissuaded (I dissuaded myself)
home - my-sites
siteurl - my-sites

Whilst I can think of ways in which many of the other options would be useful, I've taken a somewhat minimal approach here to try and prevent a huge amount of data being transferred during the initial migration as well as simply polluting this new table.

Last edited 7 months ago by iamfriendly (previous) (diff)

This ticket was mentioned in Slack in #core-multisite by jeremyfelt. View the logs.


7 months ago

#5 @mnelson4
7 months ago

So to clarify, the main problem right now is that if you want to query for all sites' names, you would need to do an SQL join to ALL site option tables (or loop through them all). Yeah, that makes it pretty well impossible doesn't it (especially once you have a network with over 100 sites or so).

So this site_meta table would only be used for options you know you want to query, right?

Admittedly, it would be a little confusing having two places for this site-specific data (is it in the options table or in the site meta table? Depends on how much we want to do a multi-site query involving it). This could probably be alleviated by having get_site_option magically check in the blog meta table for entries it knows are stored there and not in the options table. Eh?

#6 @flixos90
7 months ago

@mnelson4 Indeed that's what this table would be for. The name blogmeta is only since sitemeta is unfortunately already taken for network meta. We'd love to find a better name, but currently we don't have a better idea.

The data would probably be stored in one or the other table only. Our plan is to have the *_option() functions check whether the requested option is in the whitelist for blogmeta (only when multisite is activated of course) and choose the table depending on that. We'll have to be careful how we approach this though as we of course will also have a migration path to go, and it might happen that a blogmeta option is not actually in that table just because it has not been migrated yet.

This ticket was mentioned in Slack in #core-multisite by flixos90. View the logs.


7 months ago

#8 @MaximeCulea
7 months ago

I was also thinking about dates, for querying and cache purpose, by adding to this list the registration_date and last_modification_date.
What do you think about this ?

#9 @mnelson4
7 months ago

We'd love to find a better name, but currently we don't have a better idea.

I like blogmeta. If it were up to me the two terms in use would just be blogs (entries in the wp_blogs table) and networks (entries in the wp_sites table, which would ideally be renamed wp_networks), and we'd entirely avoid the now-rather-ambiguous term sites (I always need to double-check what that means, especially seeing how I only deal with multisite stuff periodically). But that's a bit off-topic.

So that plan sounds good to me!

Last edited 7 months ago by mnelson4 (previous) (diff)

#10 @spacedmonkey
7 months ago

  • Summary changed from Introduce shared wp_blogmeta database table for multisite installations to Introduce shared wp_blogmeta database table for multisite installation

Blogmeta has to be the name, as the meta api needs that naming conversation to work. It is also a better naming, make everything be meta. The name is really unimportant for a table.

So I don't think that it should be a copy of options table. Storing transient and rubbish options plugins add in this table seems a little pointless. We should have an array of key names (option names) that is filter able that are copied over. We can only copy over useful data that way, but plugin creates add data where desired. The list of option should be heavily vetted, but home URL, site URL and blog title are a sure thing for me.

Keeping the data betweeen the data two tablessynced is going to be interesting. But we if hook in on the add_option / edit_option / delete_option actions in the option functions on multisite, you can use get_current_blog_id and also save data blog meta table. This means that unless there are plugins that are touching options table directly (which I am sure there are), if the correct functions are used, everything is fine.

Roll out for this functionality is going to be hard. We can create the table on upgrade, but the first time sync would be pretty painful on a massive multisite. Maybe it would have to work the term split. Creating a single cron event for each site to sync the data over. The roll out will need a lot of thought. I willing to help with this ticket..

This ticket was mentioned in Slack in #core-multisite by spacedmonkey. View the logs.


7 months ago

#12 @flixos90
7 months ago

As a reminder of Tuesday's multisite conversation and also to possibly discuss this more: The general consent on Tuesday was to call the database table blogmeta, but name all the functions interacting with it *_site_meta(). This would allow for better parity if we at some point handle network meta to (the table is sitemeta, and the functions should then be called *_network_meta()).

While far from optimal, I think it might be an acceptable pattern to call DB stuff the old way (since there's no easy way to rename) while trying to use new naming conventions in code.

#13 @spacedmonkey
6 months ago

Looking into this again, I noticed if we have the follow options in the blogmeta table

blogname
siteurl
post_count
home
allowedthemes

we could remove a tonne of switch_to_blog in core. Switch_to_blog is a pig, so removing it in some key places (specially in the WP_Site class) would a performance win.

#14 @spacedmonkey
6 months ago

  • Keywords has-patch added

I have put together a patch of my ideas for the blog meta functions. This patch is very simple. It does the following

  • add the *_site_meta functions
  • Creates a new function, which returns a white list of options that will be synced with the options
  • hook into the add / update / delete actions in the options functions and copy the data into the blog meta table.

This patch is missing a create table and migration of the data. But it is good first start and is a simple clean implementation.

This ticket was mentioned in Slack in #core-multisite by jeremyfelt. View the logs.


6 months ago

#16 @spacedmonkey
6 months ago

I have updated my "patch" . Here are the changes

  • Better type checking of an array
  • Fallback to options if database isn't updated
  • Add db_version to list of options
  • Added a start to the migration of data

This ticket was mentioned in Slack in #core-multisite by jeremyfelt. View the logs.


6 months ago

This ticket was mentioned in Slack in #core-multisite by jeremyfelt. View the logs.


5 months ago

#19 @flixos90
5 months ago

  • Keywords dev-feedback added

I have spent some time on a first patch ready to apply to trunk, based on our discussions.

As discussed, the site meta project will have 3 (or 4) stages:

  1. Introduce wp_blogmeta database table, its surrounding site meta API and related functionality like meta query support in WP_Site_Query, however none of this should be used by core at this point. The list of core site meta keys should be used as a blacklist so that developers cannot store their own site meta values for these keys.
  2. Migrate the core site meta keys from their options tables to the wp_blogmeta table and keep them in sync (*_site_meta() should also update *_site_option() and vice-versa). There's still some discussion to do regarding whether we should sync at all or only migrate (delete options immediately and only provide a fallback so that get_option() calls get_site_meta()). If we end up syncing, it will only be a temporary solution to make sure things work, before we actually remove the options a few versions later.
  3. Start making use of the site meta query functionality in core.
  4. If in 2. it was decided to sync the core keys instead of only moving them from options to site meta, we should at this point stop syncing and actually perform the migration.

For the next version we'll likely only need stage 1, and that is what I implemented in https://github.com/jeremyfelt/wordpress-develop/pull/3. I also provided it as a regular patch with 37923-phase1.diff, so that you can easily apply it to trunk.

The stages 2-3 (or 4) will likely end up in follow-up tickets later.

This ticket was mentioned in Slack in #core-multisite by flixos90. View the logs.


5 months ago

#21 @spacedmonkey
5 months ago

Good first patch. First I think we should break out the meta query into another ticket. I think it is to much to put in on this. Debug will become a mess.

I noticed that the reserved keys array is not filterable. I think this is a good move, but we should have an filter to add more keys to the end.

There is nothing in there to sync. I think the best option is to hook into the add/update/delete option action and sync data on option save. A one way sync to option -> blogmeta. If users write to blogmeta, it is there responsibility.

I think we should also be using the meta function in place of many of the switch_to_blog in core. I will create another ticket for this.

#22 @flixos90
5 months ago

@spacedmonkey

Regarding meta query, let's continue to work within the GitHub repo. The separate ticket #40229 can be used later when we get closer to commit to break this feature into smaller pieces.

Making the array filterable might make sense, but I tend to say let's not add a filter to keep this function core-only. If a filter is introduced, it should only be able to add keys, not remove.

As stated in my comment above, this patch is only for our planned stage 1 of site meta, which does not have any sync or does not even migrate options. It only introduces the functionality, which I think is what we should focus on for now code-wise.

As above, actually using site meta in core will be part of a later stage that is likely going to be introduced 1 or 2 releases after the initial functionality.

#23 @flixos90
5 months ago

Bookmarking #37928, #37929 and #37930 here which would need to be implemented in one way or the other for the sync/fallback to blogmeta table part. Not important yet, but once we start getting into detail with migration and sync implementation.

#24 follow-up: @andy
5 months ago

it would be nice to be able to query for sites using some kind of other arbitrary method

There is no need to move the primary source of truth into a global table. What you need is secondary index that covers all of the data you are interested in and supports an acceptable level of consistency. You can build that using filters and any data store you like: mysql, elasticsearch, etc.

There is prior art in multisite search via external indexes. Proponents of a core patch should explore the needs, solutions and problems already understood by those who have done it. Use cases might differ too much to make a unified solution viable.

I have browsed 37923-phase1.diff and it looks like quality code but I'm not convinced any of this is better than adding or improving the API for plugins.

#25 in reply to: ↑ 24 @flixos90
5 months ago

Replying to andy:

it would be nice to be able to query for sites using some kind of other arbitrary method

There is no need to move the primary source of truth into a global table. What you need is secondary index that covers all of the data you are interested in and supports an acceptable level of consistency.

From my POV there is a difference to be made between a site's metadata and site options:

  • I'd describe metadata as two things: first, it's miscellaneous data that is associated with the site (for example post_count or db_version); second, it's what we'd put into the main database table if we had the possibility to always adjust it to our needs (which we obviously don't) (for example blogname and blogdescription) - only the first is what I would actually consider meta even if we had the possibility to put it into the main table. The latter is actually data that describes the site in a network context (core also treats these in a special way, see the 4 extra "properties" in WP_Site or get_blog_details()).
  • Options however are what their name implies, they define how a piece of functionality on the site works rather than providing some kind of identification for it.

My approach would therefore be to migrate (i.e.) move the "almost-site" data and site metadata into blogmeta as the source of truth. Everything else is options and thus can remain as is. Proxying *_option() calls to blogmeta for the moved values would only need to happen for backward-compatibility. So I don't consider that table an implementation of a secondary index, although I have to admit we started discussing that because of the lack of a possibility for querying by these values. But I assume the reason this has not come up before is that we rather not touch things if we don't actually need a new functionality from them.

There is prior art in multisite search via external indexes. Proponents of a core patch should explore the needs, solutions and problems already understood by those who have done it. Use cases might differ too much to make a unified solution viable.

I have browsed 37923-phase1.diff and it looks like quality code but I'm not convinced any of this is better than adding or improving the API for plugins.

What I stated above is obviously my opinion, maybe the general concensus ends up in this being a means to a secondary index only. I'm certainly open to introducing the feature in a flexible manner, for example by supporting override via a drop-in. I think however the underlying functionality will be essential for parts of multisite, therefore core should provide a default solution to the problem.

#26 @spacedmonkey
5 months ago

Even if the blogmeta isn't used to sync data from the options table doesn't mean we shouldn't add it. Adding blogmeta, would be extremely useful. For two reasons.

  1. Currently many multisite plugins (domain mapping being a good example) add tables to store data. Blog meta will be a wordpressy way of extending the wp_site object.
  2. It means we can removing the blog_versions table and replace it with a much more useful table.

We should go ahead with adding this new table and api. Syncing the data from options is another issue.

#27 @andy
5 months ago

There is nothing unWordPressy about plugins creating tables. WordPress never claimed to have a one-size-fits-all database schema.

Another question came up in IRL chat last night: what about export/import?

#28 @johnjamesjacoby
5 months ago

Another caveat: core's existing _site_transient() functions make _site_meta() function names confusing.

Yes... we all pretty much understand functions with _site_ in their name could mean blogs or networks, not pointing at the same object or database table. But transients, meta, and options traditionally are saved in the same place, and if we introduce a _blogmeta database table, it's plausible we might want _blog_transient() functions also.

  • Do we want _blog_meta() functions instead, to match the _blogmeta table?
  • Do we want to easily enable transients to exist in _blogmeta vs (or in addition to) _options? If yes, does this finally more easily enable multisite cron handling without resorting to building an entire external job queue?
  • My plugin uses _blog_ for function names instead of _site_ and I'd like to update it to whatever direction we think is best for WordPress core going forward, before adoption rises and backwards compatibility becomes a concern.
  • Should we add a wp_ prefix to any new transient functions, to allow both site & network transients? Then we could deprecate the old _site_transient() functions with new wp_*_network_transient() ones?
Last edited 5 months ago by johnjamesjacoby (previous) (diff)

#29 @flixos90
5 months ago

@johnjamesjacoby Good point. I think we shouldn't introduce new blog functions unless it would be totally confusing to call something site (for parity regarding related functionality).

In this case I think:

  • Use site_meta and network_meta (if we do this) since those are completely new APIs.
  • We should introduce network_transient functions to match the network_option functions.
  • We should introduce blog_transient functions to match blog_option functions. We can't call it site_transient because of the naming collision. I don't think it's necessarily expected that this function name matches the new site_meta since it is not apparent where blog_transient data is stored, sometimes in object cache if available. I actually would expect blog_transient to work similarly to blog_option since this is where those transients have been stored. I think you're talking about storing transients in the new table, but blog_transient would be confusing as a name for that functionality. I don't have a better name, but I also don't know yet whether this is necessary to have.
  • Prefixing with wp_ should be the final step only if we really can't figure this out.

@flixos90
5 months ago

#30 follow-up: @flixos90
5 months ago

As discussed during today's office-hours, we're probably going for introducing the blogmeta table unrelated to our original plans for an options index, which needs to be investigated independently, since blogmeta is probably not the right approach, as determined during recent discussions.

Main benefits:

  • Enables developers to store arbitrary data about a site in the multisite scope.
  • Opens up the future /sites REST API endpoint for developers so that they can register_meta() for sites that is handled by the API (both read and write access, will be implemented in the actual endpoint for which no ticket has yet been created).
  • Allows more flexible querying in WP_Site_Query.
  • Brings parity with other core objects, all of which have metadata tables.

Related: We are likely going to introduce *_network_meta() API functions as well (see #37181), the table is already in place.

37923.diff is an updated patch that only contains the blogmeta table and its surrounding API. The PR at https://github.com/jeremyfelt/wordpress-develop/pull/3 has been updated as well; however, since the introduction of this table is a much more straightforward process and since there were some merge conflicts with the previous patch that I already resolved in 37923.diff, I'd prefer if we continued to work on this through patches from now on.

#31 in reply to: ↑ 30 @jeremyfelt
5 months ago

Replying to flixos90:

I'd prefer if we continued to work on this through patches from now on.

+1

This ticket was mentioned in Slack in #core-multisite by flixos90. View the logs.


5 months ago

This ticket was mentioned in Slack in #core-multisite by streetlamp. View the logs.


3 months ago

#34 @jeremyfelt
3 months ago

  • Keywords has-unit-tests needs-testing added; 2nd-opinion removed
  • Milestone changed from Awaiting Review to Future Release

Moving this to future release so that we can keep progress going. The objectives that @flixos90 lays out above are good.

I'm not sure about _site_meta() vs _blog_meta() since the table name in this case is wp_blogmeta and a wp_sitemeta table exists for network meta.

This ticket was mentioned in Slack in #core-multisite by flixos90. View the logs.


3 months ago

#36 @flixos90
3 months ago

Pasting my comments from Slack here:

<slack>
I think we should be talking about site and network meta together. While they don't need to happen at a similar time (maybe not even in the same release), we should have a clear fixed path going forward for both before we start finalizing/merging one of them.

My opinion is that both should be introduced. Site meta for the reasons I outlined in #37923, and network meta because its API is more structured, standardized throughout core, and the db table was made for meta after all.

Also I think their functions should be *_site_meta() and *_network_meta(). We have discussed that a couple times, and as far as I can remember we agreed on using current naming conventions for new functionality. The mixture is obviously terrible, and introducing new functions with the old naming will make it even harder to work our way away from it. The database, options etc can and should continue to have their legacy names, but all APIs should reflect what at some point was decided to be the terminology for... well, sites and networks.
</slack>

Also adding @jeremyfelt's comments from Slack:

I've been treating network meta as lower priority behind the /sites and /users endpoints and work to support that. I think we need to land those features before we start introducing other big changes.

I agree with that statement, site meta should be introduced in the same release or soon after /sites as the endpoint will be another argument for introducing the table. Network meta could follow a bit later, as the change would also be more justifiable with a site meta API in place.

Let's put this on our list of things to discuss in office-hours next week. From my POV the goal would be to also put #25344 in the "Future Release" milestone, but the primary objective should be determining a roadmap for both projects considered as a whole.

This ticket was mentioned in Slack in #core-multisite by spacedmonkey. View the logs.


2 months ago

This ticket was mentioned in Slack in #core-multisite by spacedmonkey. View the logs.


6 weeks ago

#39 @spacedmonkey
6 weeks ago

  • Milestone changed from Future Release to 4.9
  • Owner set to spacedmonkey
  • Status changed from new to assigned

#40 @flixos90
5 weeks ago

  • Keywords ms-roadmap added

These tickets belong to our planned roadmap (a few of them not per final decision), so flagging with a keyword for better overview.

@spacedmonkey
3 weeks ago

#41 follow-up: @spacedmonkey
3 weeks ago

  • Keywords changed from has-patch, dev-feedback, has-unit-tests, needs-testing, ms-roadmap to has-patch dev-feedback has-unit-tests needs-testing ms-roadmap

In 37923.2.diff I made some changes to @flixos90 patch.

  • Removed references to meta_query. Works on this functionality should continue here #40229
  • Added a new network option when site meta is installed. This is similar to how ms_files_rewriting was handled in core.
  • Moved *_site_meta function into the option.php file. This means plugin developers can use the functions on none multisite sites with getting function exists errors. We may think about falling back to options table on non multisite.
  • Added update_site_meta_cache to wp_site_cache
  • Moved formatting.php and meta.php earlier in bootstrap. See #37181
  • Changed _prime_site_caches to work similar to other prime functions.
  • Updated doc to 4.9.

@spacedmonkey
3 weeks ago

#42 @spacedmonkey
3 weeks ago

37923.3.diff adds in missing tests.

This ticket was mentioned in Slack in #core-multisite by flixos90. View the logs.


12 days ago

#44 in reply to: ↑ 41 @flixos90
2 days ago

  • Keywords dev-feedback removed

Replying to spacedmonkey:

I reviewed 37923.3.diff. It looks good for the most part, I just have a few notes about fixes/suggestions:

Removed references to meta_query. Works on this functionality should continue here #40229

The changes from blog_id to $wpdb->blogs.blog_id can be removed too, plus there's some incorrect indentation in that file.

Added a new network option when site meta is installed. This is similar to how ms_files_rewriting was handled in core.

Let's further think about this. It's a good initial approach, but this is the most complex problem that we need to solve. One thing that should be changed is that the *_site_meta() functions should use get_network_option() instead of get_site_option(). get_site_option() is outdated and confusing especially being used there, where site_meta means site meta and site_option actually means network option. If we're really precise, we actually also need to pass the correct network ID for the site for which metadata is requested, but I don't see a reasonable way to do that. I'm wondering whether we should put a higher priority on our global storage discussion from last week. It would help a lot with this too. Let's bring it up in next office-hours.

Moved *_site_meta function into the option.php file. This means plugin developers can use the functions on none multisite sites with getting function exists errors. We may think about falling back to options table on non multisite.

I'm strongly against this. What benefit does it have to move the functions to option.php? They are multisite-only, so they should reside in a multisite file. For now it should be located in ms-blogs.php, and later be moved into ms-site.php, as suggested in #40647, which would bring the location in line with the other meta functions (all of which are located in the file specific to their object type).

Moved formatting.php and meta.php earlier in bootstrap. See #37181

I'm fine with this being included in the patch, since it's needed, however as a reminder, this should be committed separately in advance, as part of #40948. Why is formatting.php moved too? Is that required somewhere?

Another thing that needs to be fixed is that get_site_meta() must not return false, but an empty array or string, depending on the $single parameter. This is weird, but get_metadata() works like that, so it should use that same return type.

A random question, not related to the patch: Where do the database versions actually come from? Is that a changeset number? Just highlighting since we're randomly using 40001 in here.

Last edited 2 days ago by flixos90 (previous) (diff)
Note: See TracTickets for help on using tickets.