#17582 closed defect (bug) (fixed)
Problems with duplicated users
Reported by: | scribu | Owned by: | ryan |
---|---|---|---|
Milestone: | 3.2 | Priority: | normal |
Severity: | normal | Version: | |
Component: | Query | Keywords: | close |
Focuses: | Cc: |
Description
Since WP 3.2, you can do this:
$users = get_users( array( 'meta_query' => array( 'relation' => 'OR', array( 'key' => 'foo', ), array( 'key' => 'bar', ), ) ) );
which is great, except you can end up with a lot of duplicate users.
There's no straightforward way to tell WP_User_Query to add a distinct clause.
Sure, you can do this:
add_action( 'pre_user_query', function( $user_query ) { $user_query->query_fields = 'DISTINCT ' . $user_query->query_fields; } );
but the query that populates $user_query->total_users
will ignore that.
Attachments (5)
Change History (41)
#2
@
13 years ago
With that patch, you could do something like this:
add_action( 'pre_user_query', function( $user_query ) { $user_query->query_fields = 'SQL_CALC_FOUND_ROWS DISTINCT ' . $user_query->query_fields; } ); add_filter( 'found_users_query', function( $sql ) { return 'SELECT FOUND_ROWS()'; } );
Obviously, a 'distinct' arg for WP_User_Query that takes care of this would be better.
#4
@
13 years ago
You mean GROUP BY, I assume.
It will avoid duplicates, but it will also mess with the total user count, which is what I'm having trouble with.
#5
@
13 years ago
Core should be adding a GROUP BY ID which applies to both queries -- like we do in WP_Query.. I don't see a need to make this work via a plugin, Core should handle it.
#6
@
13 years ago
Yes, Core should handle it, but like I said, GROUP BY ID isn't going to cut it. You won't get the correct total:
SELECT COUNT(*) FROM wp_users ... GROUP BY ID
will always return 1.
#7
@
13 years ago
- Milestone changed from Awaiting Review to 3.2
17582.diff introduces 'distinct' query var and uses FOUND_ROWS() to calculate total.
Moving to 3.2 milestone for consideration.
#8
@
13 years ago
"Since WP 3.2, you can do this:"
Should that be since 3.1? If this is not a regression in 3.2, I'm inclined to punt this to 3.3.
#11
@
13 years ago
17582.3.diff isn't significantly simpler than 17582.2.diff and there's no proof that it has better performance. Remember query_posts(): #10964
#12
@
13 years ago
Also, 17582.2.diff allows you to switch to the other method, if you are absolutely sure that it's faster in a particular case.
#13
follow-up:
↓ 14
@
13 years ago
and there's no proof that it has better performance
Actually, .2.diff is faster than .3.diff
Run EXPLAIN on the queries generated by two patches, you'll find that .2.diff is faster.
#14
in reply to:
↑ 13
@
13 years ago
Replying to greuben:
and there's no proof that it has better performance
Actually, .2.diff is faster than .3.diff
Run EXPLAIN on the queries generated by two patches, you'll find that .2.diff is faster.
In future it would be really useful to include the explain output on the ticket so that anyone reviewing the patch can see it.
#15
@
13 years ago
In future it would be really useful to include the explain output on the ticket so that anyone reviewing the patch can see it.
Sorry, here it is
17582.2.diff
+----+-------------+-------------+------+------------------+---------+---------+-----------------+------+---------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------------+------+------------------+---------+---------+-----------------+------+---------------------------------+ | 1 | SIMPLE | wp_users | ALL | PRIMARY | NULL | NULL | NULL | 2 | Using temporary; Using filesort | | 1 | SIMPLE | wp_usermeta | ref | user_id,meta_key | user_id | 8 | dev.wp_users.ID | 7 | Distinct | | 1 | SIMPLE | mt1 | ref | user_id,meta_key | user_id | 8 | dev.wp_users.ID | 7 | Using where; Distinct | +----+-------------+-------------+------+------------------+---------+---------+-----------------+------+---------------------------------+
17582.3.diff
+----+-------------+-------------+------+------------------+---------+---------+-----------------+------+---------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------------+------+------------------+---------+---------+-----------------+------+---------------------------------+ | 1 | SIMPLE | wp_users | ALL | PRIMARY | NULL | NULL | NULL | 2 | Using temporary; Using filesort | | 1 | SIMPLE | wp_usermeta | ref | user_id,meta_key | user_id | 8 | dev.wp_users.ID | 7 | | | 1 | SIMPLE | mt1 | ref | user_id,meta_key | user_id | 8 | dev.wp_users.ID | 7 | Using where | +----+-------------+-------------+------+------------------+---------+---------+-----------------+------+---------------------------------+
#16
@
13 years ago
.2.diff requires the caller to know when to apply distinct, yes? Whereas .3.diff does not. Can $meta_query->get_sql() be made smart enough to know when DISTINCT is required and return a clause for it?
#17
@
13 years ago
Can $meta_query->get_sql() be made smart enough to know when DISTINCT is required and return a clause for it?
Yes it is possible because DISTINCT is only required for OR'ed queries.
There is one problem with using distinct. Consider two users with same display name and run this query.
new WP_User_Query( array( 'distinct' => true, 'fields' => array( 'display_name' ), 'meta_query' => array( 'relation' => 'OR', array( 'key' => 'wp_capabilities' ), array( 'key' => 'name' ) ) ) )
#20
@
13 years ago
Arguably, returning the name twice in that case isn't very useful anyway, since you can't distinguish between users.
Anyway, I'm fine with 17582.3.diff as long as it also includes the 'found_users_query' filter, so that one could switch to FOUND_ROWS() if needed.
PS: For consistency's sake, GROUP BY should be uppercase:
$this->query_groupby = 'GROUP BY ID';
#22
@
13 years ago
I'm not sure that would be so straightforward. What if we want to add a 'distinct' query var to WP_User_Query later on?
Also, WP_Meta_Query::get_sql() is used in WP_Query too.
#23
@
13 years ago
It would be better to set a possible_duplicates flag on both WP_Meta_Query and WP_Tax_Query. Based on that, WP_Query and WP_User_Query could take appropriate action.
#24
@
13 years ago
Sorry, I was thinking of get_sql() returning the entire SELECT clause.
Having it return just 'DISTINCT' or an empty string sounds good.
#25
@
13 years ago
Found a simpler solution: 17582.4.diff just checks 'OR' == $meta_query->relation
, since that's the only case when duplicates can occur.
We should be doing the same in WP_Query, but that's another ticket.
#26
@
13 years ago
.4.diff looks good. Tested for regressions in wp_dropdown_user() and users.php paging. All seems well.
#27
@
13 years ago
- Owner set to ryan
- Resolution set to fixed
- Status changed from new to closed
In [18178]:
#28
@
13 years ago
Curious to know why a group by wasn't used, instead of a distinct.
The distinct will cause the data to be sort by all fields in their select order, at which point dupe are eliminated.
If we were to append the order by clause with the ID and then group by and order by the latter, by contrast, we'd eliminate the dups without needing to re-order at all.
#29
follow-up:
↓ 30
@
13 years ago
greuben's test suggested that DISTINCT is faster. Of course, more benchmarks are always welcome.
#30
in reply to:
↑ 29
@
13 years ago
Replying to scribu:
greuben's test suggested that DISTINCT is faster.
That was because he was grouping by id. In other words it's ordered twice.
If you order by display_name, the group by/order by should be:
GROUP BY display_name, ID ORDER BY display_name, ID
That way they'll only need to be ordered once, and the extra ID sub-ordering will strip out the dups (at a negligible cost). Best I'm aware MySQL won't do this automatically when using distinct, though I may be wrong (in which case it would certainly explain why it was faster to use distinct).
#31
@
13 years ago
If you order by display_name, the group by/order by should be:
GROUP BY display_name, ID ORDER BY display_name, IDThat way they'll only need to be ordered once, and the extra ID sub-ordering will strip out the dups (at a negligible cost).
Here's some more profiling info based on the suggested query & query with patch .2.diff
+----------+------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Query_ID | Duration | Query | +----------+------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | 1 | 0.01018900 | SELECT wp_users.* FROM wp_users INNER JOIN wp_usermeta ON (wp_users.ID = wp_usermeta.user_id) INNER JOIN wp_usermeta AS mt1 ON (wp_users.ID = mt1.user_id) group by display_name, ID ORDER BY display_name, ID ASC | | 2 | 0.00065200 | SELECT SQL_CALC_FOUND_ROWS DISTINCT wp_users.* FROM wp_users INNER JOIN wp_usermeta ON (wp_users.ID = wp_usermeta.user_id) INNER JOIN wp_usermeta AS mt1 ON (wp_users.ID = mt1.user_id) ORDER BY display_name ASC | | 3 | 0.01074400 | SELECT wp_users.* FROM wp_users INNER JOIN wp_usermeta ON (wp_users.ID = wp_usermeta.user_id) INNER JOIN wp_usermeta AS mt1 ON (wp_users.ID = mt1.user_id) group by display_name, ID ORDER BY display_name, ID ASC | | 4 | 0.00065100 | SELECT SQL_CALC_FOUND_ROWS DISTINCT wp_users.* FROM wp_users INNER JOIN wp_usermeta ON (wp_users.ID = wp_usermeta.user_id) INNER JOIN wp_usermeta AS mt1 ON (wp_users.ID = mt1.user_id) ORDER BY display_name ASC | +----------+------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
I am not sure about the theory but DISTINCT is faster in this case.
#32
@
13 years ago
- Resolution fixed deleted
- Status changed from closed to reopened
Reopening for discussion. I'm definitely inclined to agree with Denis
#33
@
13 years ago
- Milestone changed from 3.2 to Future Release
No discussion was had. And no patch was provided. Revisit for 3.3?
found_users_query.17582.diff introduces the 'found_users_query' filter, similar to 'found_posts_query'.
It's the minimum required to get this working correctly from a plugin.