WordPress.org

Make WordPress Core

Opened 7 months ago

Last modified 7 days ago

#40228 reopened enhancement

Use get_sites in get_blog_details

Reported by: spacedmonkey Owned by: flixos90
Milestone: 5.0 Priority: normal
Severity: normal Version:
Component: Networks and Sites Keywords: has-patch has-unit-tests ms-roadmap
Focuses: multisite Cc:

Description

Currently get_blog_details performs raw sql queries to get data out of the blogs tables. This data is then cached. However how the caching is done in the function is horrible. It generates three different caches that store the whole wp_site object in cache.

wp_cache_delete( $blog_id , 'blog-details' );
wp_cache_delete( $blog_id . 'short' , 'blog-details' );
wp_cache_delete(  $domain_path_key, 'blog-lookup' );    

This function should be refactor to use get_sites and relay on the caching built into wp_site_query

Attachments (6)

40228-tests.diff (1.6 KB) - added by flixos90 7 months ago.
40228.diff (11.8 KB) - added by flixos90 7 months ago.
40228.2.diff (12.1 KB) - added by flixos90 4 weeks ago.
40228.3.diff (12.7 KB) - added by flixos90 3 weeks ago.
40228.improved-caching.diff (1.3 KB) - added by spacedmonkey 7 days ago.
40228.improved-caching.2.diff (1.3 KB) - added by spacedmonkey 7 days ago.

Download all attachments as: .zip

Change History (47)

#1 @spacedmonkey
7 months ago

Here is what a get_blog_details function would look like using get_sites. It basically creates different args and passes it to get_sites and returns 1 site.


function get_blog_details( $fields = null, $get_all = true ) {
        $args = array();
        if ( is_array( $fields ) ) {
                if ( isset($fields['blog_id']) ) {
                        $args['site__in'][] = $fields['blog_id'];
                } 
                if ( isset($fields['domain']) ){
                        $domains = array( $fields['domain'] );
                        if ( substr( $fields['domain'], 0, 4 ) == 'www.' ) {
                                $nowww = substr( $fields['domain'], 4 );
                                $domains[] = $nowww;
                        }
                        $args['domain__in'] = $domains;
                }
                if( isset($fields['path']) ) {
                        $args['path'] = $fields['path'];
                }

        } elseif ( is_numeric( $fields ) ){
                $args['site__in'][] = $fields;
        } elseif ( $fields && is_string( $fields ) ) {
                $current_network = get_network();
                $slug = trim( $fields, '/' );

                if ( is_subdomain_install() ) {
                        $domain = $slug . '.' . preg_replace( '|^www\.|', '', $current_network->domain );
                        $path = $current_network->path;
                } else {
                        $domain = $current_network->domain;
                        $path = $current_network->path . $slug . '/';
                }
                $args['path'] = $path;
                $args['domain'] = $domain;
        } else {
                $args['site__in'][] = get_current_blog_id();
        } 
        
        if( empty( $args ) ){
                return false;
        }
        
        $args['number'] = 1;
        $sites = get_sites( $args );
        
        if( empty( $sites ) ){
                return false;
        }
        
        $details = array_shift( $sites );
        $blog_id = $details->blog_id; 

        if ( $get_all ) {
                switch_to_blog( $blog_id );
                $details->blogname   = get_option( 'blogname' );
                $details->siteurl    = get_option( 'siteurl' );
                $details->post_count = get_option( 'post_count' );
                $details->home       = get_option( 'home' );
                restore_current_blog();
        
        /**
         * Filters a blog's details.
         *
         * @since MU
         * @deprecated 4.7.0 Use site_details
         *
         * @param object $details The blog details.
         */
        $details = apply_filters_deprecated( 'blog_details', array( $details ), '4.7.0', 'site_details' );
        }
        return $details;
}

#2 @flixos90
7 months ago

  • Owner set to flixos90
  • Status changed from new to assigned

Thanks for the code snippet @spacedmonkey!

I think we should rather direct our efforts to get_site_by() (#40180) and then wrap that function with get_blog_details(). See https://core.trac.wordpress.org/ticket/40064#comment:19

I'll probably work on a patch later that takes your above proposal into account.

#3 @spacedmonkey
7 months ago

I like the idea of get_site_by however it is not a full replacement for all the params that get_blog_details. get_term_by only supports getting a term by one field, such as slug. However get_blog_details allows an array to passed and lookup is done by multiple fields. This array maps pretty well to the args of get_sites.

Looking at this code again, I might skip get_sitrs if a number is passed and just use get_site.

#4 @flixos90
7 months ago

I think we should also support an array in get_site_by(). We can definitely wrap it in get_blog_details(). I have written a patch for that, which I'll upload in a bit.

#5 @spacedmonkey
7 months ago

Here is a much less tidy version of the function but does things a much more quicker way. If you pass nothing or a number, just use get_site. This means that we don't need to spin up a while wp_site_query, which will use a lot lesss memory.


function get_blog_details( $fields = null, $get_all = true ) {
        $args = array();
        $details = null;
        if ( is_numeric( $fields ) ){
                $details = get_site( $fields );
        } elseif ( is_array( $fields ) ) {
                if ( isset($fields['blog_id']) ) {
                        $args['site__in'][] = $fields['blog_id'];
                } 
                if ( isset($fields['domain']) ){
                        $domains = array( $fields['domain'] );
                        if ( substr( $fields['domain'], 0, 4 ) == 'www.' ) {
                                $nowww = substr( $fields['domain'], 4 );
                                $domains[] = $nowww;
                        }
                        $args['domain__in'] = $domains;
                }
                if( isset($fields['path']) ) {
                        $args['path'] = $fields['path'];
                }

        } elseif ( $fields && is_string( $fields ) ) {
                $current_network = get_network();
                $slug = trim( $fields, '/' );

                if ( is_subdomain_install() ) {
                        $domain = $slug . '.' . preg_replace( '|^www\.|', '', $current_network->domain );
                        $path = $current_network->path;
                } else {
                        $domain = $current_network->domain;
                        $path = $current_network->path . $slug . '/';
                }
                $args['path'] = $path;
                $args['domain'] = $domain;
        } else {
                $details = get_site();
        } 
        if( null === $details ){
              if( empty( $args ) ){
                    return false;
               }
        
               $args['number'] = 1;
               $sites = get_sites( $args );
        
               if( empty( $sites ) ){
                     return false;
               }
        
               $details = array_shift( $sites );
        }
        $blog_id = $details->blog_id; 

        if ( $get_all ) {
                switch_to_blog( $blog_id );
                $details->blogname   = get_option( 'blogname' );
                $details->siteurl    = get_option( 'siteurl' );
                $details->post_count = get_option( 'post_count' );
                $details->home       = get_option( 'home' );
                restore_current_blog();
        
        /**
         * Filters a blog's details.
         *
         * @since MU
         * @deprecated 4.7.0 Use site_details
         *
         * @param object $details The blog details.
         */
        $details = apply_filters_deprecated( 'blog_details', array( $details ), '4.7.0', 'site_details' );
        }
        return $details;
}

#6 @flixos90
7 months ago

40228-tests.diff are two additional unit tests that verify the behavior described by @stephdau in https://core.trac.wordpress.org/ticket/40180#comment:3 works as expected.

#7 @flixos90
7 months ago

In 40317:

Multisite: Add further unit tests for get_blog_details().

These tests verify that the returned site object is iterable and contains the expected properties.

See #40228, #40180.

@flixos90
7 months ago

#8 @flixos90
7 months ago

  • Keywords has-patch has-unit-tests dev-feedback added
  • Milestone changed from Awaiting Review to 4.8
  • Version 3.0 deleted

40228.diff shows a possible implementation of get_blog_details() using get_site_by() (see #40180). Since the patch relies on that function, it is required that you apply the patch on #40180 as well in order to test.

All regular get_blog_details() tests pass after the change, however I had to adjust / partly remove some tests related to the blog-details and blog-lookup caches, since these now don't exist any longer. There are two tests failing related to the site-details cache, however that is due to a bug in setting that cache, for which I'll open a ticket shortly.

The code in get_blog_details() has obviously been drastically reduced, and by using get_site_by() the patch fulfills the original requirement of using the more efficient get_sites() / get_site() functions. Admittedly there is one really weird line in the new code ($site->detail = $site->detail;), but it actually does what it is supposed to do (PHP magic method weirdness) - however this code is one of the reasons that I think we should recommend using get_site_by() over get_blog_details() going forward (see more below).

I think this approach makes more sense than refactoring get_blog_details() itself. get_site_by() is more explicit in its usage and does not explicitly set the "site details" on the object (which is a bad practice since they're now lazy-loaded, but we have to keep it in get_blog_details() for BC. We could use get_site_by() in other core functions, like get_id_from_blogname() for example. These are some of the reasons why I think get_site_by() is viable to have in core although it serves a rather similar use-case like get_blog_details(). Let's not deprecate get_blog_details(), but let's provide a better alternative for new development going forward.

Last edited 7 months ago by flixos90 (previous) (diff)

#9 @flixos90
7 months ago

Regarding the above bug, I opened #40247 and added a patch. So in order to have the unit tests pass, the patches from #40247, #40180 and this very ticket need to be applied.

This ticket was mentioned in Slack in #core-multisite by spacedmonkey. View the logs.


7 months ago

This ticket was mentioned in Slack in #core-multisite by flixos90. View the logs.


7 months ago

This ticket was mentioned in Slack in #core-multisite by flixos90. View the logs.


7 months ago

This ticket was mentioned in Slack in #core-multisite by flixos90. View the logs.


7 months ago

This ticket was mentioned in Slack in #core-multisite by flixos90. View the logs.


6 months ago

This ticket was mentioned in Slack in #core-multisite by flixos90. View the logs.


6 months ago

This ticket was mentioned in Slack in #core-multisite by flixos90. View the logs.


6 months ago

#17 @flixos90
6 months ago

  • Milestone changed from 4.8 to Future Release

Let's get back to this soon, once we figured out what to do about #40180 (introduce a new function and use it vs adjusting get_blog_details() directly).

This ticket was mentioned in Slack in #core-multisite by flixos90. View the logs.


3 months ago

This ticket was mentioned in Slack in #core-multisite by jeremyfelt. View the logs.


3 months ago

#20 @flixos90
3 months ago

  • Keywords ms-roadmap added
  • Milestone changed from Future Release to 4.9

Per today's office hours, it was decided to introduce the new function and use it to improve get_blog_details().

This ticket was mentioned in Slack in #core-multisite by flixos90. View the logs.


8 weeks ago

This ticket was mentioned in Slack in #core-multisite by flixos90. View the logs.


5 weeks ago

@flixos90
4 weeks ago

#23 @flixos90
4 weeks ago

40228.2.diff is a reworked version of get_blog_details() to work with the latest patch in #40180.

The removal of all the custom logic in get_blog_details() causes the cache groups blog-details and blog-lookup to no longer be used, so we can probably remove them completely within the next releases. Therefore these groups have been removed from related unit tests and replaced with the newer cache groups.

This ticket was mentioned in Slack in #core by jeffpaul. View the logs.


3 weeks ago

This ticket was mentioned in Slack in #core-multisite by flixos90. View the logs.


3 weeks ago

#26 @jeremyfelt
3 weeks ago

40228.2.diff is looking good. It applies cleanly and tests pass after [41698]. I'll spend some more time with it tomorrow morning.

I do think we need to keep the deprecated blog_details filter as part of get_blog_details(). Its deprecation is a recommendation to use another filter, but we'd be breaking back-compat by removing it.

This ticket was mentioned in Slack in #core-multisite by jeremyfelt. View the logs.


3 weeks ago

@flixos90
3 weeks ago

#28 @flixos90
3 weeks ago

40228.3.diff adds in the missing documentation for the old blog_details filter and adjusts documentation for the back-compat code to be more verbose.

#29 @flixos90
3 weeks ago

  • Resolution set to fixed
  • Status changed from assigned to closed

In 41719:

Multisite: Improve get_blog_details() by using get_site_by().

get_site_by() is now the preferred way to retrieve a site object by lookup for identifying data. By using a coherent structure and get_sites() internally, it has several advantages over the direct database queries and complex code in get_blog_details(). Therefore get_blog_details() is now a wrapper for get_site_by(), providing backward compatibility fixes where necessary.

Unit tests have been adjusted to account for the blog-details and blog-lookup cache groups, which are no longer needed.

Props spacedmonkey, jeremyfelt, flixos90.
Fixes #40228.

#30 follow-up: @westi
13 days ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

These changes look like they introduce sever regressions in the actual caching of data for calls to get_blog_details as far as I can tell from a quick review.

The issue is that by switching to WP_Site_Query and using get_sites() as our source of caching to avoid queries we effectively remove most of the caching behind get_blog_details.

Previously the cached data for a blog would only been invalidated if there were changes to that blogs details or they were explicitly cleared.

Now we have caching into WP_Site_Query which is invalidated everytime we change the last_changed value for sites.

We change the last_changed value for sites everytime:

  • A new site is created
  • clean_blog_cache is called (aka refresh_blog_details is called)

So invalidating the data for one site invalidates them all, which on any multisite install with a reasonable number of sites makes the caching pointless - every time update_blog_status is called because a site has new content all the caches are invalidated.

We also have added a significant amount of extra code which has to be run to fetch the info about a single site by calling get_site_by and then get_sites and then WP_Site_Query instead of a simple $wpdb query.

This ticket was mentioned in Slack in #core-multisite by jeremyfelt. View the logs.


13 days ago

#32 in reply to: ↑ 30 @flixos90
13 days ago

Replying to westi:

The issue is that by switching to WP_Site_Query and using get_sites() as our source of caching to avoid queries we effectively remove most of the caching behind get_blog_details.

Previously the cached data for a blog would only been invalidated if there were changes to that blogs details or they were explicitly cleared.

Now we have caching into WP_Site_Query which is invalidated everytime we change the last_changed value for sites.

Do I understand you correctly that you are talking particularly about the removal of the blog-lookup cache (used for domain/path combination lookups before)? We faced a similar issue before with #40362, and your comment confirms our approach to hold off of that one.
Given the outlined problems, my suggestion would be to add usage of that cache group back in cases where it applies (domain+path), and do something similar to a lookup by ID (the individual sites cache is also more granularly invalidated than last_changed used by WP_Site_Query). So we'd keep using get_sites() in case of a cache miss for the original caches.

Does that sound good to you?

#33 @westi
7 days ago

  • Severity changed from normal to blocker

I think it is more nuanced than that but lets compare in detail to see.

The current release version of get_blog_details leverages caching as follows:

1) blog-lookup cache for converting domain + path => blog_id
2) blog-details cache for converting a blog_id => The response of get_blog_details
3) sites cache (inside WP_Site) to populate the WP_Site object on construction via get_instance
4) site-details cache - This is potentially used for the extended properties on the WP_Site object but not directly by get_blog_details as it sets the extended properties itself by switching and calling get_option

This gives us up to four different cache groups being used, which are all global groups keyed by blog_id and invalidation only happens to cache items when the object they reference has changed - i.e. invalidation is tied to mutation of the state of a blog_id.

The code in trunk leverages caching as follows:

get_blog_details calls get_site_by calls get_sites calls WP_Site_Query::query calls WP_Site_Query::get_sites this path leverages caching as follows:

1) sites cache inside WP_Site_Query::get_sites which is keyed off a cache key based on the query and last_changed timestamp for sites.
2) _prime_site_caches is called with the list of found IDs to prime the sites cache group with the wp_blogs data for each found ID
3) get_site is called for each id to get the WP_Site object which uses the existing or primed data in the sites cache group.

So yes, maybe bringing back the blog-lookup cache would make this change safer, but also I think we need to not call get_site_by if we have a blog_id and we should review and document the change in query count and complexity when the caches are all empty.

Side note: is WP_Site_Query safe for large multi-site installs aka does it self neuter bad queries?

Also, I think this ticket is now a blocker for 4.9?

This ticket was mentioned in Slack in #core-multisite by flixos90. View the logs.


7 days ago

#35 @spacedmonkey
7 days ago

40228.improved-caching.2.diff

Addes some improves to the get_site_by function.

  • If the lookup is by id, don't use get_sites, just use get_site and do the network_id compare using the WP_Site object for better performance.
  • Implement the blog-lookup cache in get_site_by to save lookup on domain + path, as this cache key is invalidated in clean_blog_cache and get_sites is invalidated a lot in high traffic sites.

This ticket was mentioned in Slack in #core-multisite by flixos90. View the logs.


7 days ago

#37 @flixos90
7 days ago

We discussed this in detail in today's meeting and came to the conclusion that the commit should be reverted for now:

  • The original idea was to bring back the blog-lookup cache, since it's less aggressively invalidated (per site instead of per any site). We wanted to leverage it in get_site_by() first, however it caches full site objects including their site details which is not what's needed in get_site_by(). So we then considered bringing it back to get_blog_details() itself.
  • However, there's also the blog-details cache, which is in a way more stable than either of the sites or site-details cache since it also caches sites not found (as -1).

The above with all the other complexity of get_blog_details() involved brought us to the conclusion to revert this change for now.

In addition, get_site_by() itself should be reverted as well for now and rescheduled for 5.0: We need to generally work on improving caching for WP_Site_Query (and WP_Network_Query) as well as consider storing not found values for the sites and networks caches too. While get_site_by() makes sense as an explicit and less complex replacement for get_blog_details(), it is not ready yet in terms of caching, where it currently falls short of get_blog_details() under specific circumstances. When we have the new code's caches work as well as the old ones, we can introduce get_site_by() as a proper replacement for get_blog_details() and reconsider using it in the old function too.

#38 @westi
7 days ago

Thanks for all your hard work on this, reading the logs it does sound like moving this change to 5.0 is a better idea than trying to resolve all the issues this close to the release of 4.9.

#39 @flixos90
7 days ago

In 41883:

Multisite: Revert [41719].

While get_site_by() makes sense as a more explicit and less complex replacement for get_blog_details(), it is not ready yet in terms of caching, where it currently falls short of the older function under specific circumstances.

See #40180, #40228.

#40 @flixos90
7 days ago

In 41884:

Multisite: Revert [41698] and [41743].

In order for get_site_by() to be truly beneficial, caching in WP_Site_Query needs to be improved to account for common use-cases and have them be invalidated less aggressively.

See #40180, #40228, #42091.

#41 @flixos90
7 days ago

  • Keywords dev-feedback removed
  • Milestone changed from 4.9 to 5.0
  • Severity changed from blocker to normal

This can be revisited when caching in get_site_by() works as well as (or better than) that in get_blog_details().

Note: See TracTickets for help on using tickets.