Make WordPress Core

Opened 19 months ago

Last modified 3 months ago

#54346 accepted enhancement

Slow SQL queries fetching posts from specific categories

Reported by: krstarica's profile Krstarica Owned by: johnbillion's profile johnbillion
Milestone: 6.3 Priority: normal
Severity: normal Version:
Component: Taxonomy Keywords: needs-testing has-patch
Focuses: rest-api, performance Cc:

Description

Noticed that REST API queries used in mobile app are very slow and found that such SQL queries can be optimized to be 10 times faster (1.8134 seconds vs. 0.1804 seconds) for wp_posts table having 800k records, see below.

Very similar queries are used when displaying posts from specific categories on the web, meaning optimizing this could lead to significant speed up everywhere.

The one responsible for this is WP_Tax_Query->get_sql_for_clause function.

REST API to fetch posts from specific categories, e.g. /wp-json/wp/v2/posts/?per_page=10&_embed=1&categories=63545,63546,63547,63548,63549,63552,76287&page=1 executes the following SQL query:

SELECT 
  wp_posts.ID 
FROM 
  wp_posts 
  LEFT JOIN wp_term_relationships ON (
    wp_posts.ID = wp_term_relationships.object_id
  ) 
WHERE 
  1 = 1 
  AND (
    wp_term_relationships.term_taxonomy_id IN (
      63545, 63546, 63547, 63548, 63549, 63552, 76287
    )
  ) 
  AND wp_posts.post_type = 'post' 
  AND(
    (wp_posts.post_status = 'publish')
  ) 
GROUP BY 
  wp_posts.ID 
ORDER BY 
  wp_posts.post_date DESC 
LIMIT 
  0, 10

Query took 1.8134 seconds.

This query can be optimized by using subquery:

SELECT 
  wp_posts.ID 
FROM 
  wp_posts 
WHERE 
  wp_posts.ID IN (
    SELECT 
      object_id 
    FROM 
      wp_term_relationships 
    WHERE 
      wp_term_relationships.term_taxonomy_id IN (
        63545, 63546, 63547, 63548, 63549, 63552, 76287
      )
  ) 
  AND wp_posts.post_type = 'post' 
  AND wp_posts.post_status = 'publish' 
ORDER BY 
  wp_posts.post_date DESC 
LIMIT 
  0, 10

Query took 0.1804 seconds seconds.

Change History (7)

#1 @Krstarica
19 months ago

Here is the solution.

In wp-includes/class-wp-tax-query.php replace:

$alias = $this->find_compatible_table_alias( $clause, $parent_query );
if ( false === $alias ) {
	$i     = count( $this->table_aliases );
	$alias = $i ? 'tt' . $i : $wpdb->term_relationships;

	// Store the alias as part of a flat array to build future iterators.
	$this->table_aliases[] = $alias;

	// Store the alias with this clause, so later siblings can use it.
	$clause['alias'] = $alias;

	$join .= " LEFT JOIN $wpdb->term_relationships";
	$join .= $i ? " AS $alias" : '';
	$join .= " ON ($this->primary_table.$this->primary_id_column = $alias.object_id)";
}

$where = "$alias.term_taxonomy_id $operator ($terms)";

with:

$where = "$this->primary_table.$this->primary_id_column IN (
	SELECT object_id
	FROM $wpdb->term_relationships
	WHERE term_taxonomy_id IN ($terms)
)";

Such query takes 1.3058 seconds. That saves 0.5 seconds.

To further speed it up, in wp-includes/class-wp-query.php replace:

if ( ! empty( $this->tax_query->queries ) || ! empty( $this->meta_query->queries ) ) {
	$groupby = "{$wpdb->posts}.ID";
}

with:

if ( ! empty( $this->meta_query->queries ) ) {
	$groupby = "{$wpdb->posts}.ID";
}

Such query takes 0.1804 seconds.

Important: since we removed "GROUP BY wp_posts.ID" for tax queries, we need to make sure that all other cases in WP_Tax_Query->get_sql_for_clause use subqueries, too.

#2 @Krstarica
19 months ago

Running this for several days in production and speed improvement is impressive.

Definitely should use subqueries in get_sql_for_clause functions (class-wp-meta-query.php and class-wp-tax-query.php) and then completely remove this code from class-wp-query.php:

if ( ! empty( $this->tax_query->queries ) || ! empty( $this->meta_query->queries ) ) {
	$groupby = "{$wpdb->posts}.ID";
}

#3 @johnbillion
19 months ago

  • Keywords needs-patch needs-testing added
  • Version 5.8.1 deleted

@Krstarica Thanks for working on this! Are you able to put together a pull request on GitHub so we can get the test suite run with these changes in place please? Docs here: https://make.wordpress.org/core/handbook/contribute/git/github-pull-requests-for-code-review/

#4 @joseaneto
18 months ago

Like @Krstarica we have had identical SQL performance problems with the LEFT JOIN query from the "get_sql_for_clause" function in some WordPress installations with +500,000 records in the posts table.

We have been using their solution in production and works perfectly, returning optimal performance on SQL queries, many going from +150 seconds to milliseconds.

Hopefully, this change can be translated into an upcoming update.

This ticket was mentioned in PR #3644 on WordPress/wordpress-develop by ivptr.


7 months ago
#5

  • Keywords has-patch added; needs-patch removed

#6 @Krstarica
7 months ago

@johnbillion I've created pull request.

Some unit tests failed because they expect JOIN, which is removed from the query.

Last edited 7 months ago by Krstarica (previous) (diff)

#7 @johnbillion
3 months ago

  • Milestone changed from Awaiting Review to 6.3
  • Owner set to johnbillion
  • Status changed from new to accepted
Note: See TracTickets for help on using tickets.