WordPress.org

Make WordPress Core

Opened 2 weeks ago

Last modified 5 days ago

#41796 reopened defect (bug)

WP_Term_Query problem using object_ids and number attributes

Reported by: elvishp2006 Owned by: boonebgorges
Milestone: 4.9 Priority: normal
Severity: normal Version: 4.8.1
Component: Taxonomy Keywords: needs-patch
Focuses: Cc:

Description

Hi, how are you? hope so :)
I'm having a problem with the query generated by the WP_Term_Query class when using the "object_ids" and "number" attribute.

When the same term is associated in more than 1 post, the generated "request" repeats this term for each occurrence in "object_ids", so if I pass the "number" attribute with the value 8, it may happen that a number of terms lower than expected.

My suggestion is that when you have the "object_ids" attribute, force the distinct clause in the query.

Attachments (3)

class-wp-term-query.php.patch (583 bytes) - added by elvishp2006 2 weeks ago.
Patch
request-results.png (48.8 KB) - added by elvishp2006 2 weeks ago.
The request and results
41796.diff (3.2 KB) - added by boonebgorges 8 days ago.

Download all attachments as: .zip

Change History (16)

This ticket was mentioned in Slack in #core by elvishp2006. View the logs.


2 weeks ago

@elvishp2006
2 weeks ago

The request and results

#2 @elvishp2006
2 weeks ago

  • Keywords has-patch added

This ticket was mentioned in Slack in #core by elvishp2006. View the logs.


2 weeks ago

This ticket was mentioned in Slack in #core by sergey. View the logs.


2 weeks ago

@boonebgorges
8 days ago

#5 @boonebgorges
8 days ago

  • Keywords needs-testing added
  • Milestone changed from Awaiting Review to 4.9

Hi @elvishp2006 - Thanks for the ticket and for the patch. I've reproduced the bug.

The DISTINCT keyword can have big performance implications, so ideally we would avoid it when possible. In addition, there are cases where duplicate term IDs should be preserved - specifically, when fields=all_with_object_id. See https://core.trac.wordpress.org/browser/tags/4.8.1/src/wp-includes/class-wp-term-query.php?marks=747-765#L747. We use PHP to filter out duplicates based on this criterion.

As such, I'd recommend a more modest approach: Only add the DISTINCT keyword when a lack of distinctness affects 'number' or 'offset', and when fields != all_with_object_id.

Have a look at 41796.diff and see if it makes sense to you.

#6 @elvishp2006
8 days ago

@boonebgorges I applied the patch and it worked perfectly, the bug was solved.
Thank you very much for your attention :)

Last edited 8 days ago by elvishp2006 (previous) (diff)

#7 @boonebgorges
7 days ago

  • Owner set to boonebgorges
  • Resolution set to fixed
  • Status changed from new to closed

In 41377:

Taxonomy: Force a DISTINCT term query when result count matters.

Generally, duplicate terms returned by a term query are eliminated in PHP,
after the database query takes place. This technique doesn't work properly
when the query parameters specify the number of results, since the results
of a SELECT ... LIMIT x... query may be deduplicated to a count less than
x. In these cases, we force the original query to be DISTINCT.

Props elvishp2006.
Fixes #41796.

#8 @Otto42
6 days ago

Just a note that this change appears to have broken the "tag cloud" on the WordPress.org support forums. We had to disable them because the queries generated with this new DISTINCT in place were running too long.

https://meta.trac.wordpress.org/changeset/5927

For reference, the taxonomy involved here is "topic-tag" which is used for the support forum topics in bbPress. The terms table and term_taxonomy table each contain approximately 500000 entries.

The resulting query looks like the following:

SELECT DISTINCT t.*, tt.* FROM wp_terms AS t 
INNER JOIN wp_term_taxonomy AS tt 
ON t.term_id = tt.term_id 
WHERE tt.taxonomy IN ('topic-tag') 
AND tt.count > 0 
ORDER BY tt.count DESC 
LIMIT 22

Remove the DISTINCT keyword and the query runs more or less fine.

So, systems with lots of terms may be negatively impacted, possibly broken, by this change. Might want to rethink it, and test on large installations.

#9 @boonebgorges
6 days ago

  • Keywords needs-patch added; has-patch needs-testing removed
  • Resolution fixed deleted
  • Status changed from closed to reopened

Thanks for the note, @Otto42. I knew that it would have performance implications, which is why I argued for only using DISTINCT when absolutely necessary. I hadn't thought of the tag cloud as a place that would be affected, but I knew that there would be places affected.

I see three options:

  1. Leave [41377] in. Other sites will face the same fate as wordpress.org. Should probably be avoided.
  2. Revert [41377], and also add a note to the inline docs that says that number is simply incompatible with object_ids. This is weird and inconsistent (we *do* add the DISTINCT clause for meta queries) but would return us to the status quo.
  3. Try to solve this in PHP.

The last option seems like the best, but it is not straightforward. The naive solution: When number accompanies object_ids, don't add the LIMIT clause; Fetch all results, and trim down to number in PHP. This is going to kill sites like wordpress.org in a different way (instead of DISTINCT, it'll be the 500,000 tags in memory).

A more complicated PHP solution is to use the LIMIT logic, and then to detect in PHP when the returned and deduped results are less than number. If so, recursively requery for $number - count( $results ). Eg: number=10; deduping from object_ids gives you 8; WP_Term_Query requeries number=2&offset=8 and appends the results; etc until we have 10. This is likely to be very complicated.

If anyone has good ideas, please share them. Otherwise I think we might go with option 2 (the status quo).

Last edited 6 days ago by boonebgorges (previous) (diff)

#10 @Otto42
6 days ago

Since we only saw this with the tag cloud function, maybe making that function specify the fields it needs is an option.

#11 @elvishp2006
5 days ago

I do not know if I'm talking bullshit, but can caching the results be a solution?

#12 @Otto42
5 days ago

can caching the results be a solution?

Not if there are no results to cache. The query with DISTINCT simply never returns, because it's killed before it can provide any results. Caching doesn't fix slow queries, it's more useful for fixing queries that run too often.

#13 @elvishp2006
5 days ago

Does it make any sense to just force the distinct in this situation?

<?php
if ( ! empty( $limits ) && 'all_with_object_id' !== $args['fields'] && $args['object_ids'] ) {
        $do_distinct = true;
}
Note: See TracTickets for help on using tickets.