Make WordPress Core

Opened 2 years ago

Last modified 3 months ago

#54346 accepted enhancement

Slow SQL queries fetching posts from specific categories

Reported by: krstarica's profile Krstarica Owned by: johnbillion's profile johnbillion
Milestone: Future Release Priority: normal
Severity: normal Version:
Component: Taxonomy Keywords: needs-testing has-patch has-unit-tests
Focuses: rest-api, performance Cc:

Description

Noticed that REST API queries used in mobile app are very slow and found that such SQL queries can be optimized to be 10 times faster (1.8134 seconds vs. 0.1804 seconds) for wp_posts table having 800k records, see below.

Very similar queries are used when displaying posts from specific categories on the web, meaning optimizing this could lead to significant speed up everywhere.

The one responsible for this is WP_Tax_Query->get_sql_for_clause function.

REST API to fetch posts from specific categories, e.g. /wp-json/wp/v2/posts/?per_page=10&_embed=1&categories=63545,63546,63547,63548,63549,63552,76287&page=1 executes the following SQL query:

SELECT 
  wp_posts.ID 
FROM 
  wp_posts 
  LEFT JOIN wp_term_relationships ON (
    wp_posts.ID = wp_term_relationships.object_id
  ) 
WHERE 
  1 = 1 
  AND (
    wp_term_relationships.term_taxonomy_id IN (
      63545, 63546, 63547, 63548, 63549, 63552, 76287
    )
  ) 
  AND wp_posts.post_type = 'post' 
  AND(
    (wp_posts.post_status = 'publish')
  ) 
GROUP BY 
  wp_posts.ID 
ORDER BY 
  wp_posts.post_date DESC 
LIMIT 
  0, 10

Query took 1.8134 seconds.

This query can be optimized by using subquery:

SELECT 
  wp_posts.ID 
FROM 
  wp_posts 
WHERE 
  wp_posts.ID IN (
    SELECT 
      object_id 
    FROM 
      wp_term_relationships 
    WHERE 
      wp_term_relationships.term_taxonomy_id IN (
        63545, 63546, 63547, 63548, 63549, 63552, 76287
      )
  ) 
  AND wp_posts.post_type = 'post' 
  AND wp_posts.post_status = 'publish' 
ORDER BY 
  wp_posts.post_date DESC 
LIMIT 
  0, 10

Query took 0.1804 seconds seconds.

Attachments (2)

Screenshot 2023-06-21 at 20.39.01.png (149.7 KB) - added by spacedmonkey 3 months ago.
Screenshot 2023-06-21 at 20.38.48.png (151.9 KB) - added by spacedmonkey 3 months ago.

Download all attachments as: .zip

Change History (18)

#1 @Krstarica
2 years ago

Here is the solution.

In wp-includes/class-wp-tax-query.php replace:

$alias = $this->find_compatible_table_alias( $clause, $parent_query );
if ( false === $alias ) {
	$i     = count( $this->table_aliases );
	$alias = $i ? 'tt' . $i : $wpdb->term_relationships;

	// Store the alias as part of a flat array to build future iterators.
	$this->table_aliases[] = $alias;

	// Store the alias with this clause, so later siblings can use it.
	$clause['alias'] = $alias;

	$join .= " LEFT JOIN $wpdb->term_relationships";
	$join .= $i ? " AS $alias" : '';
	$join .= " ON ($this->primary_table.$this->primary_id_column = $alias.object_id)";
}

$where = "$alias.term_taxonomy_id $operator ($terms)";

with:

$where = "$this->primary_table.$this->primary_id_column IN (
	SELECT object_id
	FROM $wpdb->term_relationships
	WHERE term_taxonomy_id IN ($terms)
)";

Such query takes 1.3058 seconds. That saves 0.5 seconds.

To further speed it up, in wp-includes/class-wp-query.php replace:

if ( ! empty( $this->tax_query->queries ) || ! empty( $this->meta_query->queries ) ) {
	$groupby = "{$wpdb->posts}.ID";
}

with:

if ( ! empty( $this->meta_query->queries ) ) {
	$groupby = "{$wpdb->posts}.ID";
}

Such query takes 0.1804 seconds.

Important: since we removed "GROUP BY wp_posts.ID" for tax queries, we need to make sure that all other cases in WP_Tax_Query->get_sql_for_clause use subqueries, too.

#2 @Krstarica
23 months ago

Running this for several days in production and speed improvement is impressive.

Definitely should use subqueries in get_sql_for_clause functions (class-wp-meta-query.php and class-wp-tax-query.php) and then completely remove this code from class-wp-query.php:

if ( ! empty( $this->tax_query->queries ) || ! empty( $this->meta_query->queries ) ) {
	$groupby = "{$wpdb->posts}.ID";
}

#3 @johnbillion
23 months ago

  • Keywords needs-patch needs-testing added
  • Version 5.8.1 deleted

@Krstarica Thanks for working on this! Are you able to put together a pull request on GitHub so we can get the test suite run with these changes in place please? Docs here: https://make.wordpress.org/core/handbook/contribute/git/github-pull-requests-for-code-review/

#4 @joseaneto
21 months ago

Like @Krstarica we have had identical SQL performance problems with the LEFT JOIN query from the "get_sql_for_clause" function in some WordPress installations with +500,000 records in the posts table.

We have been using their solution in production and works perfectly, returning optimal performance on SQL queries, many going from +150 seconds to milliseconds.

Hopefully, this change can be translated into an upcoming update.

This ticket was mentioned in PR #3644 on WordPress/wordpress-develop by ivptr.


11 months ago
#5

  • Keywords has-patch added; needs-patch removed

#6 @Krstarica
11 months ago

@johnbillion Just created pull request.

Some unit tests failed because they expect JOIN, which is removed from the query.

Version 0, edited 11 months ago by Krstarica (next)

#7 @johnbillion
7 months ago

  • Milestone changed from Awaiting Review to 6.3
  • Owner set to johnbillion
  • Status changed from new to accepted

#8 @oglekler
4 months ago

  • Keywords needs-unit-tests added

If Unit tests are failing because there are new rules in place, unit tests needs to be changed too.

I want to point that we are in 9 days before Beta 1 and changes such this needs proper testing and can have side effects, so, it will be more prudent to plan this to 6.4 already.

This ticket was mentioned in PR #4665 on WordPress/wordpress-develop by @spacedmonkey.


3 months ago
#9

  • Keywords has-unit-tests added; needs-unit-tests removed

#10 @spacedmonkey
3 months ago

@johnbillion I have updated the PR so that unit test pass.
https://github.com/WordPress/wordpress-develop/pull/4665

I am not sure what unit tests to add to this, there are some good existing ones. This is not new functionality this is testing existing functionality.

#11 @spacedmonkey
3 months ago

@johnbillion @flixos90 I did some testing with 20K posts. I am seeing a net benefit of this change, but is extemely small. It is hard to test with out more posts.

#12 @spacedmonkey
3 months ago

Performance benchmarks, 35k posts, on a category page. 2k requests.

Trunk PR
Response Time (median) 124.23 140.82
wp-load-alloptions-query (median) 0.7 0.78
wp-before-template (median) 82.21 93.67
wp-before-template-db-queries (median) 18.71 20.53
wp-template (median) 36.53 40.57
wp-total (median) 119.19 134.82
wp-template-db-queries (median) 1.89 2.11

#13 @johnbillion
3 months ago

Are the trunk/PR columns the correct way around in that table? It's showing a reduction in performance.

#14 @spacedmonkey
3 months ago

@johnbillion I did some more testing with 35k posts and I am seeing a increase in server response time.

#15 @johnbillion
3 months ago

  • Milestone changed from 6.3 to Future Release

Thanks, then this needs further investigation.

#16 @spacedmonkey
3 months ago

@johnbillion If it helps, I used a plugin called fakerpress to generate 35k work of posts and categories.

I had around 500 categories as well. I think the nest query is not fast if you have lots of categories or posts. If you have 2k post of posts assigned to one category, the nested query will be slow.

Note: See TracTickets for help on using tickets.