Make WordPress Core

Opened 7 years ago

Closed 7 years ago

Last modified 3 years ago

#41796 closed defect (bug) (fixed)

WP_Term_Query problem using object_ids and number attributes

Reported by: elvishp2006's profile elvishp2006 Owned by: boonebgorges's profile boonebgorges
Milestone: 4.9 Priority: normal
Severity: normal Version: 4.8.1
Component: Taxonomy Keywords: needs-patch
Focuses: Cc:

Description

Hi, how are you? hope so :)
I'm having a problem with the query generated by the WP_Term_Query class when using the "object_ids" and "number" attribute.

When the same term is associated in more than 1 post, the generated "request" repeats this term for each occurrence in "object_ids", so if I pass the "number" attribute with the value 8, it may happen that a number of terms lower than expected.

My suggestion is that when you have the "object_ids" attribute, force the distinct clause in the query.

Attachments (3)

class-wp-term-query.php.patch (583 bytes) - added by elvishp2006 7 years ago.
Patch
request-results.png (48.8 KB) - added by elvishp2006 7 years ago.
The request and results
41796.diff (3.2 KB) - added by boonebgorges 7 years ago.

Download all attachments as: .zip

Change History (27)

This ticket was mentioned in Slack in #core by elvishp2006. View the logs.


7 years ago

@elvishp2006
7 years ago

The request and results

#2 @elvishp2006
7 years ago

  • Keywords has-patch added

This ticket was mentioned in Slack in #core by elvishp2006. View the logs.


7 years ago

This ticket was mentioned in Slack in #core by sergey. View the logs.


7 years ago

@boonebgorges
7 years ago

#5 @boonebgorges
7 years ago

  • Keywords needs-testing added
  • Milestone changed from Awaiting Review to 4.9

Hi @elvishp2006 - Thanks for the ticket and for the patch. I've reproduced the bug.

The DISTINCT keyword can have big performance implications, so ideally we would avoid it when possible. In addition, there are cases where duplicate term IDs should be preserved - specifically, when fields=all_with_object_id. See https://core.trac.wordpress.org/browser/tags/4.8.1/src/wp-includes/class-wp-term-query.php?marks=747-765#L747. We use PHP to filter out duplicates based on this criterion.

As such, I'd recommend a more modest approach: Only add the DISTINCT keyword when a lack of distinctness affects 'number' or 'offset', and when fields != all_with_object_id.

Have a look at 41796.diff and see if it makes sense to you.

#6 @elvishp2006
7 years ago

@boonebgorges I applied the patch and it worked perfectly, the bug was solved.
Thank you very much for your attention :)

Last edited 7 years ago by elvishp2006 (previous) (diff)

#7 @boonebgorges
7 years ago

  • Owner set to boonebgorges
  • Resolution set to fixed
  • Status changed from new to closed

In 41377:

Taxonomy: Force a DISTINCT term query when result count matters.

Generally, duplicate terms returned by a term query are eliminated in PHP,
after the database query takes place. This technique doesn't work properly
when the query parameters specify the number of results, since the results
of a SELECT ... LIMIT x... query may be deduplicated to a count less than
x. In these cases, we force the original query to be DISTINCT.

Props elvishp2006.
Fixes #41796.

#8 @Otto42
7 years ago

Just a note that this change appears to have broken the "tag cloud" on the WordPress.org support forums. We had to disable them because the queries generated with this new DISTINCT in place were running too long.

https://meta.trac.wordpress.org/changeset/5927

For reference, the taxonomy involved here is "topic-tag" which is used for the support forum topics in bbPress. The terms table and term_taxonomy table each contain approximately 500000 entries.

The resulting query looks like the following:

SELECT DISTINCT t.*, tt.* FROM wp_terms AS t 
INNER JOIN wp_term_taxonomy AS tt 
ON t.term_id = tt.term_id 
WHERE tt.taxonomy IN ('topic-tag') 
AND tt.count > 0 
ORDER BY tt.count DESC 
LIMIT 22

Remove the DISTINCT keyword and the query runs more or less fine.

So, systems with lots of terms may be negatively impacted, possibly broken, by this change. Might want to rethink it, and test on large installations.

#9 @boonebgorges
7 years ago

  • Keywords needs-patch added; has-patch needs-testing removed
  • Resolution fixed deleted
  • Status changed from closed to reopened

Thanks for the note, @Otto42. I knew that it would have performance implications, which is why I argued for only using DISTINCT when absolutely necessary. I hadn't thought of the tag cloud as a place that would be affected, but I knew that there would be places affected.

I see three options:

  1. Leave [41377] in. Other sites will face the same fate as wordpress.org. Should probably be avoided.
  2. Revert [41377], and also add a note to the inline docs that says that number is simply incompatible with object_ids. This is weird and inconsistent (we *do* add the DISTINCT clause for meta queries) but would return us to the status quo.
  3. Try to solve this in PHP.

The last option seems like the best, but it is not straightforward. The naive solution: When number accompanies object_ids, don't add the LIMIT clause; Fetch all results, and trim down to number in PHP. This is going to kill sites like wordpress.org in a different way (instead of DISTINCT, it'll be the 500,000 tags in memory).

A more complicated PHP solution is to use the LIMIT logic, and then to detect in PHP when the returned and deduped results are less than number. If so, recursively requery for $number - count( $results ). Eg: number=10; deduping from object_ids gives you 8; WP_Term_Query requeries number=2&offset=8 and appends the results; etc until we have 10. This is likely to be very complicated.

If anyone has good ideas, please share them. Otherwise I think we might go with option 2 (the status quo).

Last edited 7 years ago by boonebgorges (previous) (diff)

#10 @Otto42
7 years ago

Since we only saw this with the tag cloud function, maybe making that function specify the fields it needs is an option.

#11 @elvishp2006
7 years ago

I do not know if I'm talking bullshit, but can caching the results be a solution?

#12 @Otto42
7 years ago

can caching the results be a solution?

Not if there are no results to cache. The query with DISTINCT simply never returns, because it's killed before it can provide any results. Caching doesn't fix slow queries, it's more useful for fixing queries that run too often.

#13 @elvishp2006
7 years ago

Does it make any sense to just force the distinct in this situation?

<?php
if ( ! empty( $limits ) && 'all_with_object_id' !== $args['fields'] && $args['object_ids'] ) {
        $do_distinct = true;
}

This ticket was mentioned in Slack in #core by melchoyce. View the logs.


7 years ago

#15 @boonebgorges
7 years ago

Since we only saw this with the tag cloud function, maybe making that function specify the fields it needs is an option.

Maybe, but it's likely that others will run into similar issues in other use cases.

Does it make any sense to just force the distinct in this situation?

@elvishp2006 This is essentially what is already being done. Checking to see whether $object_ids is empty won't prevent issues like the one that @Otto42 has described.

Given the late date, I'm going to revert and make a documentation note. If @elvishp2006 or others have ideas about how to fix the issue properly, we can reopen the ticket. If not, this'll have to be a documented quirk of the API.

#16 @boonebgorges
7 years ago

In 41880:

Don't force distinct term queries when specifying number and object_ids.

This reverts [41377], which caused performance problems on sites with a large
number of terms.

See #41796.

#17 @boonebgorges
7 years ago

  • Resolution set to fixed
  • Status changed from reopened to closed

In 41881:

Taxonomy: Add note about $number inconsistency to WP_Term_Query docs.

Fixes #41796.

#18 @raba97
4 years ago

This is still a problem in 5.4.2, so I want to leave a work around for anyone who needs it.

Im getting inconsistent results while using number + object_ids on get_terms or WP_Query.

When I put number = 10, I get 1 result.
With number = 100, I get 34 results.

This is because, as said in previous responses, the SQL query returns duplicated rows.

The only way I found to fixed this bug, without modifying the core, is to force DISTINCT through a filter

<?php
//Add a filter to the `terms_clauses` that forces DISTINCT on the SQL
function wp_terms_query_force_distinct($clauses, $taxonomies = null, $args = null){
    $clauses['distinct'] = 'DISTINCT';
    return $clauses;
}
add_filter('terms_clauses', 'wp_terms_query_force_distinct');

//Your query that uses both objects_ids and number
$terms_query = new WP_Term_Query( array(
    'object_ids'    => $ids,
    'number'        => 10,
) );

//Remove the filter right after using the query, as to not affect other queries
remove_filter('terms_clauses', 'wp_terms_query_force_distinct');

It feels kinda icky doing it like this, but I need to get this out of the way.

#19 @crstauf
3 years ago

This does not appear to have been resolved, yet the ticket is set as closed and fixed. Is there a new ticket for this issue?

#20 @crstauf
3 years ago

Last edited 3 years ago by crstauf (previous) (diff)

#21 @boonebgorges
3 years ago

Hi @crstauf - The ticket was set to 'fixed' but perhaps it would have better been 'wontfix'. See https://core.trac.wordpress.org/ticket/41796#comment:15. We tried a proper fix but it ended up introducing serious performance issues, so for now we have chalked it up as a quirk of the way that the taxonomy API works.

#22 @crstauf
3 years ago

@boonebgorges Yes I read all the comments, and that's why I mentioned that the labels are wrong. Can those be adjusted to more accurately reflect the conclusion of this ticket? I arrived here from the Docs, saw that the ticket was marked as fixed, and so figured the Docs were outdated, only to later realize that the issue was not fixed, and seemingly can't be.

#23 @boonebgorges
3 years ago

@crstauf Could you give specific details on how (and where) you think the documentation can be fixed? For reference, the inline docs are as follows: https://core.trac.wordpress.org/browser/tags/5.7.2/src/wp-includes/class-wp-term-query.php?marks=124-127#L121

#24 @crstauf
3 years ago

@boonebgorges I don't think the Docs are outdated anymore. Here, possible this timeline will be helpful.

  1. I read the description of the number parameter for `WP_Term_Query::__construct()` ("See #41796 for details.")
  2. I arrived here and saw in the ticket title that it was seemingly closed and fixed; figured that the note on the Docs page was outdated.
  3. I resumed my development work.
  4. I later encountered the issue described here in my dev work.
  5. I came back to this ticket and saw that the issue was not actually fixed.
  6. I posted the comment requesting the labels to be updated to accurately reflect the status.
Note: See TracTickets for help on using tickets.