Opened 16 years ago
Closed 16 years ago
#9167 closed defect (bug) (fixed)
query_posts('meta_key=foo') returns duplicate posts
Reported by: | scribu | Owned by: | |
---|---|---|---|
Milestone: | 2.7.2 | Priority: | normal |
Severity: | normal | Version: | |
Component: | General | Keywords: | has-patch |
Focuses: | Cc: |
Description
If you do a query_posts('meta_key=foo') and one post has these multiple custom fields with the key foo:
foo => bar1 foo => bar2
then that post is displayed 2 times in The Loop.
Attachments (1)
Change History (22)
#2
follow-up:
↓ 5
@
16 years ago
- Cc scribu@… added
Can't this be solved by simply using "SELECT DISTINCT(wp_posts.ID)..." on all queries? I can't think of a situation where you would want duplicate posts.
PS: The subquery method is exatly what I use on a plugin to filter posts based on multiple meta key=>value pairs.
#3
@
16 years ago
Surely a post shouldn't be allowed to have two identical keys. Each key should be unique and this wouldn't cause any problems.
Are there any situations when you would want to have two identical keys?
#4
@
16 years ago
A post CAN'T have two identical keys (not even revisions of the same post can), due to the SQL table schema. Identical post keys can appear in a query due to different combinations of data related to the same key.
#5
in reply to:
↑ 2
;
follow-up:
↓ 6
@
16 years ago
Replying to scribu:
Can't this be solved by simply using "SELECT DISTINCT(wp_posts.ID)..." on all queries?
No, that won't help for queries like this, because each row is already distinct. Besides, DISTINCT clauses usually have a performance penalty, and this is the core query of WP.
#6
in reply to:
↑ 5
;
follow-up:
↓ 7
@
16 years ago
Replying to filosofo:
Replying to scribu:
Can't this be solved by simply using "SELECT DISTINCT(wp_posts.ID)..." on all queries?
No, that won't help for queries like this, because each row is already distinct. Besides, DISTINCT clauses usually have a performance penalty, and this is the core query of WP.
If you write SELECT DISTINCT(*), no, it won't; but if you write SELECT DISTINCT(wp_posts.ID), wp_posts.post_author, etc. it will work. The only problem is that you couldn't SELECT wp_posts.* anymore.
I think a subquery also has a performance penalty. Nothing you can do about that.
By the way, I see that subqueries are already used in WP 2.7.1 (in query.php):
$whichcat .= " AND $wpdb->posts.ID NOT IN ( SELECT tr.object_id FROM $wpdb->term_relationships AS tr INNER JOIN $wpdb->term_taxonomy AS tt ON tr.term_taxonomy_id = tt.term_taxonomy_id WHERE tt.taxonomy = 'post_tag' AND tt.term_id IN ($tag_string) )";
#7
in reply to:
↑ 6
;
follow-up:
↓ 8
@
16 years ago
Replying to scribu:
If you write SELECT DISTINCT(*), no, it won't; but if you write SELECT DISTINCT(wp_posts.ID), wp_posts.post_author, etc. it will work. The only problem is that you couldn't SELECT wp_posts.* anymore.
That's not exactly correct. It doesn't matter what wp_posts fields you select, because when the post ID is the same they're always going to be the same. What matters---what causes the duplication---is that WP currently selects the wp_postmeta.meta_value field, which presumably is different for each meta_key that is the same. So the current query results look like this (omitting some columns):
+-----+-------------------------+-------------------+ | ID | post_title | meta_value | +-----+-------------------------+-------------------+ | 1 | A Post | meta value x | | 1 | A Post | meta value y | | 2 | Another Post | meta value y | +-----+-------------------------+-------------------+
Here each row is distinct, because post 1 has two entries for the same meta_key but different meta_value
s. To make the rows distinct without duplicating posts, you would have to not select the meta_value. Not selecting the meta_value probably isn't very helpful for most of the situations in which you would be querying by meta_key.
I think a subquery also has a performance penalty. Nothing you can do about that.
It's a matter of relative harm. To make DISTINCT work in the case we're talking about, MySQL has to determine whether an entire row is distinct, which with all the columns selected in a Loop query is going to be more work than just a simple sub-query. In my admittedly unscientific tests it seems to be significantly faster.
By the way, I see that subqueries are already used in WP 2.7.1 (in query.php):
Conditionally, if someone is using MySQL 4.1 or newer. And how many aren't? MySQL 4.1 was released in 2004. Time to require it.
#8
in reply to:
↑ 7
;
follow-up:
↓ 9
@
16 years ago
Replying to filosofo:
Not selecting the meta_value probably isn't very helpful for most of the situations in which you would be querying by meta_key.
Thinking about it a little more, I take that back. If someone is querying just for meta_key and doesn't query the meta_value, then this would work. I'll come up with a patch in a bit.
#9
in reply to:
↑ 8
@
16 years ago
Replying to filosofo:
If someone is querying just for meta_key and doesn't query the meta_value, then this would work.
Yeah, that was the topic: no meta_value. :)
How about GROUP BY (wp_posts.ID) ?
That would work in both cases.
#10
@
16 years ago
Good idea. GROUP BY is pretty much synonymous with DISTINCT, and it seems to be WP's preferred way of making results distinct (the DISTINCT clause is never used in core Loop). Attached patch groups by wp_posts.ID when meta_key is queried.
#12
follow-up:
↓ 14
@
16 years ago
You need to be careful using group by. MySQL is pretty sloppy when it comes to applying group_by to queries in comparison with other RDBMSs and essentially what you are doing is randomly throwing away information.
Now, it may be that that is your intended result, but it probably isn't.
#13
@
16 years ago
What I'm really saying is that if you're doing a query and using a group_by to filter results because some of the cells are causing duplication, then you have to question why the query is retrieving that information in the first place.
#14
in reply to:
↑ 12
;
follow-up:
↓ 15
@
16 years ago
Replying to mrmist:
Now, it may be that that is your intended result, but it probably isn't.
It is intentional, because the thrown-away information in this situation is just the different meta_values.
If you're querying just by meta_key, then you don't care if there are multiple meta_values. If you're querying by both meta_key and meta_value, then there won't be multiple meta_values. So either way it doesn't matter.
#15
in reply to:
↑ 14
;
follow-up:
↓ 16
@
16 years ago
Replying to filosofo:
If you're querying just by meta_key, then you don't care if there are multiple meta_values. If you're querying by both meta_key and meta_value, then there won't be multiple meta_values. So either way it doesn't matter.
Ideally, you should exclude the meta_value column from the SELECT statement, but I'm not sure if it can be done.
#16
in reply to:
↑ 15
@
16 years ago
Replying to scribu:
Ideally, you should exclude the meta_value column from the SELECT statement, but I'm not sure if it can be done.
That was sort of my point. If meta_value is being fetched for no reason and discarded the query cost is going up. Best never to fetch it in the first place and not need the group_by.
Of course if the query can't be broken up that way then I guess it would have to do.
#17
follow-up:
↓ 18
@
16 years ago
To keep things simple, I think removing the select for meta_value should be kept as a separate issue; I opened a ticket: #9177
#18
in reply to:
↑ 17
;
follow-up:
↓ 19
@
16 years ago
Replying to filosofo:
To keep things simple, I think removing the select for meta_value should be kept as a separate issue; I opened a ticket: #9177
Come to think of it, you don't need to SELECT any field from wp_postmeta when querying either by meta_key, or meta_value or both, right? Only then a GROUP BY wouldn't be necessary.
#19
in reply to:
↑ 18
;
follow-up:
↓ 20
@
16 years ago
Replying to scribu:
Come to think of it, you don't need to SELECT any field from wp_postmeta when querying either by meta_key, or meta_value or both, right?
Right. That's why my patch at #9177 removes the select for the only wp_postmeta field currently selected: meta_value.
Only then a GROUP BY wouldn't be necessary.
The GROUP BY (or a DISTINCT) is still necessary in the multiple meta_key scenario because wp_postmeta is joined to wp_posts, no matter what is selected.
I think this is a good example of why we need to bump the minimum required MySQL from 4.0 to at least 4.1. MySQL 4.1 introduced sub-queries, which would handle well situations like this.
Instead of the current behavior, which LEFT JOINs wp_postmeta to wp_posts and makes duplicate posts (something like this):
SELECT wp_posts.*, wp_postmeta.meta_value FROM wp_posts LEFT JOIN wp_postmeta ON (wp_posts.ID = wp_postmeta.post_id) WHERE wp_postmeta.meta_key = 'some_key'
instead, we could use a sub-query and eliminate duplicate posts.
SELECT wp_posts.* FROM wp_posts WHERE wp_posts.ID IN (SELECT post_id FROM wp_postmeta WHERE meta_key = 'some_key')