Make WordPress Core

Opened 3 years ago

Last modified 7 months ago

#54346 accepted enhancement

Slow SQL queries fetching posts from specific categories

Reported by: krstarica's profile Krstarica Owned by: craigfrancis's profile craigfrancis
Milestone: Future Release Priority: normal
Severity: normal Version:
Component: Database Keywords: needs-testing has-patch has-unit-tests
Focuses: rest-api, performance Cc:

Description

Noticed that REST API queries used in mobile app are very slow and found that such SQL queries can be optimized to be 10 times faster (1.8134 seconds vs. 0.1804 seconds) for wp_posts table having 800k records, see below.

Very similar queries are used when displaying posts from specific categories on the web, meaning optimizing this could lead to significant speed up everywhere.

The one responsible for this is WP_Tax_Query->get_sql_for_clause function.

REST API to fetch posts from specific categories, e.g. /wp-json/wp/v2/posts/?per_page=10&_embed=1&categories=63545,63546,63547,63548,63549,63552,76287&page=1 executes the following SQL query:

SELECT 
  wp_posts.ID 
FROM 
  wp_posts 
  LEFT JOIN wp_term_relationships ON (
    wp_posts.ID = wp_term_relationships.object_id
  ) 
WHERE 
  1 = 1 
  AND (
    wp_term_relationships.term_taxonomy_id IN (
      63545, 63546, 63547, 63548, 63549, 63552, 76287
    )
  ) 
  AND wp_posts.post_type = 'post' 
  AND(
    (wp_posts.post_status = 'publish')
  ) 
GROUP BY 
  wp_posts.ID 
ORDER BY 
  wp_posts.post_date DESC 
LIMIT 
  0, 10

Query took 1.8134 seconds.

This query can be optimized by using subquery:

SELECT 
  wp_posts.ID 
FROM 
  wp_posts 
WHERE 
  wp_posts.ID IN (
    SELECT 
      object_id 
    FROM 
      wp_term_relationships 
    WHERE 
      wp_term_relationships.term_taxonomy_id IN (
        63545, 63546, 63547, 63548, 63549, 63552, 76287
      )
  ) 
  AND wp_posts.post_type = 'post' 
  AND wp_posts.post_status = 'publish' 
ORDER BY 
  wp_posts.post_date DESC 
LIMIT 
  0, 10

Query took 0.1804 seconds seconds.

Attachments (2)

Screenshot 2023-06-21 at 20.39.01.png (149.7 KB) - added by spacedmonkey 13 months ago.
Screenshot 2023-06-21 at 20.38.48.png (151.9 KB) - added by spacedmonkey 13 months ago.

Download all attachments as: .zip

Change History (21)

#1 @Krstarica
3 years ago

Here is the solution.

In wp-includes/class-wp-tax-query.php replace:

$alias = $this->find_compatible_table_alias( $clause, $parent_query );
if ( false === $alias ) {
	$i     = count( $this->table_aliases );
	$alias = $i ? 'tt' . $i : $wpdb->term_relationships;

	// Store the alias as part of a flat array to build future iterators.
	$this->table_aliases[] = $alias;

	// Store the alias with this clause, so later siblings can use it.
	$clause['alias'] = $alias;

	$join .= " LEFT JOIN $wpdb->term_relationships";
	$join .= $i ? " AS $alias" : '';
	$join .= " ON ($this->primary_table.$this->primary_id_column = $alias.object_id)";
}

$where = "$alias.term_taxonomy_id $operator ($terms)";

with:

$where = "$this->primary_table.$this->primary_id_column IN (
	SELECT object_id
	FROM $wpdb->term_relationships
	WHERE term_taxonomy_id IN ($terms)
)";

Such query takes 1.3058 seconds. That saves 0.5 seconds.

To further speed it up, in wp-includes/class-wp-query.php replace:

if ( ! empty( $this->tax_query->queries ) || ! empty( $this->meta_query->queries ) ) {
	$groupby = "{$wpdb->posts}.ID";
}

with:

if ( ! empty( $this->meta_query->queries ) ) {
	$groupby = "{$wpdb->posts}.ID";
}

Such query takes 0.1804 seconds.

Important: since we removed "GROUP BY wp_posts.ID" for tax queries, we need to make sure that all other cases in WP_Tax_Query->get_sql_for_clause use subqueries, too.

#2 @Krstarica
3 years ago

Running this for several days in production and speed improvement is impressive.

Definitely should use subqueries in get_sql_for_clause functions (class-wp-meta-query.php and class-wp-tax-query.php) and then completely remove this code from class-wp-query.php:

if ( ! empty( $this->tax_query->queries ) || ! empty( $this->meta_query->queries ) ) {
	$groupby = "{$wpdb->posts}.ID";
}

#3 @johnbillion
3 years ago

  • Keywords needs-patch needs-testing added
  • Version 5.8.1 deleted

@Krstarica Thanks for working on this! Are you able to put together a pull request on GitHub so we can get the test suite run with these changes in place please? Docs here: https://make.wordpress.org/core/handbook/contribute/git/github-pull-requests-for-code-review/

#4 @joseaneto
3 years ago

Like @Krstarica we have had identical SQL performance problems with the LEFT JOIN query from the "get_sql_for_clause" function in some WordPress installations with +500,000 records in the posts table.

We have been using their solution in production and works perfectly, returning optimal performance on SQL queries, many going from +150 seconds to milliseconds.

Hopefully, this change can be translated into an upcoming update.

This ticket was mentioned in PR #3644 on WordPress/wordpress-develop by ivptr.


20 months ago
#5

  • Keywords has-patch added; needs-patch removed

#6 @Krstarica
20 months ago

@johnbillion Just created pull request.

Some unit tests failed because they expect JOIN, which is removed from the query.

Version 0, edited 20 months ago by Krstarica (next)

#7 @johnbillion
16 months ago

  • Milestone changed from Awaiting Review to 6.3
  • Owner set to johnbillion
  • Status changed from new to accepted

#8 @oglekler
13 months ago

  • Keywords needs-unit-tests added

If Unit tests are failing because there are new rules in place, unit tests needs to be changed too.

I want to point that we are in 9 days before Beta 1 and changes such this needs proper testing and can have side effects, so, it will be more prudent to plan this to 6.4 already.

This ticket was mentioned in PR #4665 on WordPress/wordpress-develop by @spacedmonkey.


13 months ago
#9

  • Keywords has-unit-tests added; needs-unit-tests removed

#10 @spacedmonkey
13 months ago

@johnbillion I have updated the PR so that unit test pass.
https://github.com/WordPress/wordpress-develop/pull/4665

I am not sure what unit tests to add to this, there are some good existing ones. This is not new functionality this is testing existing functionality.

#11 @spacedmonkey
13 months ago

@johnbillion @flixos90 I did some testing with 20K posts. I am seeing a net benefit of this change, but is extemely small. It is hard to test with out more posts.

#12 @spacedmonkey
13 months ago

Performance benchmarks, 35k posts, on a category page. 2k requests.

Trunk PR
Response Time (median) 124.23 140.82
wp-load-alloptions-query (median) 0.7 0.78
wp-before-template (median) 82.21 93.67
wp-before-template-db-queries (median) 18.71 20.53
wp-template (median) 36.53 40.57
wp-total (median) 119.19 134.82
wp-template-db-queries (median) 1.89 2.11

#13 @johnbillion
13 months ago

Are the trunk/PR columns the correct way around in that table? It's showing a reduction in performance.

#14 @spacedmonkey
13 months ago

@johnbillion I did some more testing with 35k posts and I am seeing a increase in server response time.

#15 @johnbillion
13 months ago

  • Milestone changed from 6.3 to Future Release

Thanks, then this needs further investigation.

#16 @spacedmonkey
13 months ago

@johnbillion If it helps, I used a plugin called fakerpress to generate 35k work of posts and categories.

I had around 500 categories as well. I think the nest query is not fast if you have lots of categories or posts. If you have 2k post of posts assigned to one category, the nested query will be slow.

#17 @johnbillion
8 months ago

  • Component changed from Taxonomy to Database
  • Owner johnbillion deleted
  • Status changed from accepted to assigned

@craigfrancis might this be something you can review?

#18 @craigfrancis
8 months ago

  • Owner set to craigfrancis
  • Status changed from assigned to accepted

Yep, will try this weekend.

@Krstarica or @joseaneto, would you be able to email the following to me (craig at craigfrancis dot co dot uk):

  1. The EXPLAIN output for this query, i.e. EXPLAIN SELECT wp_posts.ID ...;
  2. The output from SHOW CREATE TABLE wp_term_relationships;
  3. The output from SHOW CREATE TABLE wp_posts;
  4. Number of records you have in these tables (joseaneto, I believe you had "+500,000 records in the posts table" in December 2021).

I'll try to work out what's happening, and report back here.

#19 @craigfrancis
7 months ago

I haven't forgotten, I was able to replicate the issue, but need a bit more time to look into this.

For Ivan, with 793,227 Posts, and 53,391 Terms, where each Term is used on average by 22 Posts, and the relationship table has 1,198,905 records... it seems that, if the sub-query gets a small number of results, it's faster to join to the Posts table... but, using a sub-query can be a slower for websites with a different ratio of Posts to Terms (what I'll be looking at next).

Note: See TracTickets for help on using tickets.