WordPress.org

Make WordPress Core

Opened 3 months ago

Closed 5 weeks ago

#41796 closed defect (bug) (fixed)

WP_Term_Query problem using object_ids and number attributes

Reported by: elvishp2006 Owned by: boonebgorges
Milestone: 4.9 Priority: normal
Severity: normal Version: 4.8.1
Component: Taxonomy Keywords: needs-patch
Focuses: Cc:

Description

Hi, how are you? hope so :)
I'm having a problem with the query generated by the WP_Term_Query class when using the "object_ids" and "number" attribute.

When the same term is associated in more than 1 post, the generated "request" repeats this term for each occurrence in "object_ids", so if I pass the "number" attribute with the value 8, it may happen that a number of terms lower than expected.

My suggestion is that when you have the "object_ids" attribute, force the distinct clause in the query.

Attachments (3)

class-wp-term-query.php.patch (583 bytes) - added by elvishp2006 3 months ago.
Patch
request-results.png (48.8 KB) - added by elvishp2006 3 months ago.
The request and results
41796.diff (3.2 KB) - added by boonebgorges 2 months ago.

Download all attachments as: .zip

Change History (20)

This ticket was mentioned in Slack in #core by elvishp2006. View the logs.


3 months ago

@elvishp2006
3 months ago

The request and results

#2 @elvishp2006
3 months ago

  • Keywords has-patch added

This ticket was mentioned in Slack in #core by elvishp2006. View the logs.


3 months ago

This ticket was mentioned in Slack in #core by sergey. View the logs.


3 months ago

@boonebgorges
2 months ago

#5 @boonebgorges
2 months ago

  • Keywords needs-testing added
  • Milestone changed from Awaiting Review to 4.9

Hi @elvishp2006 - Thanks for the ticket and for the patch. I've reproduced the bug.

The DISTINCT keyword can have big performance implications, so ideally we would avoid it when possible. In addition, there are cases where duplicate term IDs should be preserved - specifically, when fields=all_with_object_id. See https://core.trac.wordpress.org/browser/tags/4.8.1/src/wp-includes/class-wp-term-query.php?marks=747-765#L747. We use PHP to filter out duplicates based on this criterion.

As such, I'd recommend a more modest approach: Only add the DISTINCT keyword when a lack of distinctness affects 'number' or 'offset', and when fields != all_with_object_id.

Have a look at 41796.diff and see if it makes sense to you.

#6 @elvishp2006
2 months ago

@boonebgorges I applied the patch and it worked perfectly, the bug was solved.
Thank you very much for your attention :)

Last edited 2 months ago by elvishp2006 (previous) (diff)

#7 @boonebgorges
2 months ago

  • Owner set to boonebgorges
  • Resolution set to fixed
  • Status changed from new to closed

In 41377:

Taxonomy: Force a DISTINCT term query when result count matters.

Generally, duplicate terms returned by a term query are eliminated in PHP,
after the database query takes place. This technique doesn't work properly
when the query parameters specify the number of results, since the results
of a SELECT ... LIMIT x... query may be deduplicated to a count less than
x. In these cases, we force the original query to be DISTINCT.

Props elvishp2006.
Fixes #41796.

#8 @Otto42
2 months ago

Just a note that this change appears to have broken the "tag cloud" on the WordPress.org support forums. We had to disable them because the queries generated with this new DISTINCT in place were running too long.

https://meta.trac.wordpress.org/changeset/5927

For reference, the taxonomy involved here is "topic-tag" which is used for the support forum topics in bbPress. The terms table and term_taxonomy table each contain approximately 500000 entries.

The resulting query looks like the following:

SELECT DISTINCT t.*, tt.* FROM wp_terms AS t 
INNER JOIN wp_term_taxonomy AS tt 
ON t.term_id = tt.term_id 
WHERE tt.taxonomy IN ('topic-tag') 
AND tt.count > 0 
ORDER BY tt.count DESC 
LIMIT 22

Remove the DISTINCT keyword and the query runs more or less fine.

So, systems with lots of terms may be negatively impacted, possibly broken, by this change. Might want to rethink it, and test on large installations.

#9 @boonebgorges
2 months ago

  • Keywords needs-patch added; has-patch needs-testing removed
  • Resolution fixed deleted
  • Status changed from closed to reopened

Thanks for the note, @Otto42. I knew that it would have performance implications, which is why I argued for only using DISTINCT when absolutely necessary. I hadn't thought of the tag cloud as a place that would be affected, but I knew that there would be places affected.

I see three options:

  1. Leave [41377] in. Other sites will face the same fate as wordpress.org. Should probably be avoided.
  2. Revert [41377], and also add a note to the inline docs that says that number is simply incompatible with object_ids. This is weird and inconsistent (we *do* add the DISTINCT clause for meta queries) but would return us to the status quo.
  3. Try to solve this in PHP.

The last option seems like the best, but it is not straightforward. The naive solution: When number accompanies object_ids, don't add the LIMIT clause; Fetch all results, and trim down to number in PHP. This is going to kill sites like wordpress.org in a different way (instead of DISTINCT, it'll be the 500,000 tags in memory).

A more complicated PHP solution is to use the LIMIT logic, and then to detect in PHP when the returned and deduped results are less than number. If so, recursively requery for $number - count( $results ). Eg: number=10; deduping from object_ids gives you 8; WP_Term_Query requeries number=2&offset=8 and appends the results; etc until we have 10. This is likely to be very complicated.

If anyone has good ideas, please share them. Otherwise I think we might go with option 2 (the status quo).

Last edited 2 months ago by boonebgorges (previous) (diff)

#10 @Otto42
2 months ago

Since we only saw this with the tag cloud function, maybe making that function specify the fields it needs is an option.

#11 @elvishp2006
2 months ago

I do not know if I'm talking bullshit, but can caching the results be a solution?

#12 @Otto42
2 months ago

can caching the results be a solution?

Not if there are no results to cache. The query with DISTINCT simply never returns, because it's killed before it can provide any results. Caching doesn't fix slow queries, it's more useful for fixing queries that run too often.

#13 @elvishp2006
2 months ago

Does it make any sense to just force the distinct in this situation?

<?php
if ( ! empty( $limits ) && 'all_with_object_id' !== $args['fields'] && $args['object_ids'] ) {
        $do_distinct = true;
}

This ticket was mentioned in Slack in #core by melchoyce. View the logs.


5 weeks ago

#15 @boonebgorges
5 weeks ago

Since we only saw this with the tag cloud function, maybe making that function specify the fields it needs is an option.

Maybe, but it's likely that others will run into similar issues in other use cases.

Does it make any sense to just force the distinct in this situation?

@elvishp2006 This is essentially what is already being done. Checking to see whether $object_ids is empty won't prevent issues like the one that @Otto42 has described.

Given the late date, I'm going to revert and make a documentation note. If @elvishp2006 or others have ideas about how to fix the issue properly, we can reopen the ticket. If not, this'll have to be a documented quirk of the API.

#16 @boonebgorges
5 weeks ago

In 41880:

Don't force distinct term queries when specifying number and object_ids.

This reverts [41377], which caused performance problems on sites with a large
number of terms.

See #41796.

#17 @boonebgorges
5 weeks ago

  • Resolution set to fixed
  • Status changed from reopened to closed

In 41881:

Taxonomy: Add note about $number inconsistency to WP_Term_Query docs.

Fixes #41796.

Note: See TracTickets for help on using tickets.