Make WordPress Core

Opened 16 years ago

Closed 16 years ago

#9167 closed defect (bug) (fixed)

query_posts('meta_key=foo') returns duplicate posts

Reported by: scribu's profile scribu Owned by:
Milestone: 2.7.2 Priority: normal
Severity: normal Version:
Component: General Keywords: has-patch
Focuses: Cc:

Description

If you do a query_posts('meta_key=foo') and one post has these multiple custom fields with the key foo:

foo => bar1
foo => bar2

then that post is displayed 2 times in The Loop.

Attachments (1)

distinct.meta_key.query.9167.diff (965 bytes) - added by filosofo 16 years ago.

Download all attachments as: .zip

Change History (22)

#1 @filosofo
16 years ago

I think this is a good example of why we need to bump the minimum required MySQL from 4.0 to at least 4.1. MySQL 4.1 introduced sub-queries, which would handle well situations like this.

Instead of the current behavior, which LEFT JOINs wp_postmeta to wp_posts and makes duplicate posts (something like this):

SELECT wp_posts.*, wp_postmeta.meta_value FROM wp_posts LEFT JOIN wp_postmeta ON (wp_posts.ID = wp_postmeta.post_id) WHERE wp_postmeta.meta_key = 'some_key'

instead, we could use a sub-query and eliminate duplicate posts.

SELECT wp_posts.* FROM wp_posts WHERE wp_posts.ID IN (SELECT post_id FROM wp_postmeta WHERE meta_key = 'some_key')

#2 follow-up: @scribu
16 years ago

  • Cc scribu@… added

Can't this be solved by simply using "SELECT DISTINCT(wp_posts.ID)..." on all queries? I can't think of a situation where you would want duplicate posts.

PS: The subquery method is exatly what I use on a plugin to filter posts based on multiple meta key=>value pairs.

#3 @tomontoast
16 years ago

Surely a post shouldn't be allowed to have two identical keys. Each key should be unique and this wouldn't cause any problems.
Are there any situations when you would want to have two identical keys?

#4 @scribu
16 years ago

A post CAN'T have two identical keys (not even revisions of the same post can), due to the SQL table schema. Identical post keys can appear in a query due to different combinations of data related to the same key.

#5 in reply to: ↑ 2 ; follow-up: @filosofo
16 years ago

Replying to scribu:

Can't this be solved by simply using "SELECT DISTINCT(wp_posts.ID)..." on all queries?

No, that won't help for queries like this, because each row is already distinct. Besides, DISTINCT clauses usually have a performance penalty, and this is the core query of WP.

#6 in reply to: ↑ 5 ; follow-up: @scribu
16 years ago

Replying to filosofo:

Replying to scribu:

Can't this be solved by simply using "SELECT DISTINCT(wp_posts.ID)..." on all queries?

No, that won't help for queries like this, because each row is already distinct. Besides, DISTINCT clauses usually have a performance penalty, and this is the core query of WP.

If you write SELECT DISTINCT(*), no, it won't; but if you write SELECT DISTINCT(wp_posts.ID), wp_posts.post_author, etc. it will work. The only problem is that you couldn't SELECT wp_posts.* anymore.

I think a subquery also has a performance penalty. Nothing you can do about that.

By the way, I see that subqueries are already used in WP 2.7.1 (in query.php):

$whichcat .= " AND $wpdb->posts.ID NOT IN ( SELECT tr.object_id FROM $wpdb->term_relationships AS tr INNER JOIN $wpdb->term_taxonomy AS tt ON tr.term_taxonomy_id = tt.term_taxonomy_id WHERE tt.taxonomy = 'post_tag' AND tt.term_id IN ($tag_string) )";

#7 in reply to: ↑ 6 ; follow-up: @filosofo
16 years ago

Replying to scribu:

If you write SELECT DISTINCT(*), no, it won't; but if you write SELECT DISTINCT(wp_posts.ID), wp_posts.post_author, etc. it will work. The only problem is that you couldn't SELECT wp_posts.* anymore.

That's not exactly correct. It doesn't matter what wp_posts fields you select, because when the post ID is the same they're always going to be the same. What matters---what causes the duplication---is that WP currently selects the wp_postmeta.meta_value field, which presumably is different for each meta_key that is the same. So the current query results look like this (omitting some columns):

+-----+-------------------------+-------------------+
| ID  | post_title              | meta_value        |
+-----+-------------------------+-------------------+
| 1   | A Post                  | meta value x      | 
| 1   | A Post                  | meta value y      | 
| 2   | Another Post            | meta value y      | 
+-----+-------------------------+-------------------+

Here each row is distinct, because post 1 has two entries for the same meta_key but different meta_values. To make the rows distinct without duplicating posts, you would have to not select the meta_value. Not selecting the meta_value probably isn't very helpful for most of the situations in which you would be querying by meta_key.

I think a subquery also has a performance penalty. Nothing you can do about that.

It's a matter of relative harm. To make DISTINCT work in the case we're talking about, MySQL has to determine whether an entire row is distinct, which with all the columns selected in a Loop query is going to be more work than just a simple sub-query. In my admittedly unscientific tests it seems to be significantly faster.

By the way, I see that subqueries are already used in WP 2.7.1 (in query.php):

Conditionally, if someone is using MySQL 4.1 or newer. And how many aren't? MySQL 4.1 was released in 2004. Time to require it.

#8 in reply to: ↑ 7 ; follow-up: @filosofo
16 years ago

Replying to filosofo:

Not selecting the meta_value probably isn't very helpful for most of the situations in which you would be querying by meta_key.

Thinking about it a little more, I take that back. If someone is querying just for meta_key and doesn't query the meta_value, then this would work. I'll come up with a patch in a bit.

#9 in reply to: ↑ 8 @scribu
16 years ago

Replying to filosofo:

If someone is querying just for meta_key and doesn't query the meta_value, then this would work.

Yeah, that was the topic: no meta_value. :)

How about GROUP BY (wp_posts.ID) ?

That would work in both cases.

#10 @filosofo
16 years ago

Good idea. GROUP BY is pretty much synonymous with DISTINCT, and it seems to be WP's preferred way of making results distinct (the DISTINCT clause is never used in core Loop). Attached patch groups by wp_posts.ID when meta_key is queried.

#11 @scribu
16 years ago

  • Keywords has-patch added

Neat.

#12 follow-up: @mrmist
16 years ago

You need to be careful using group by. MySQL is pretty sloppy when it comes to applying group_by to queries in comparison with other RDBMSs and essentially what you are doing is randomly throwing away information.

Now, it may be that that is your intended result, but it probably isn't.

#13 @mrmist
16 years ago

What I'm really saying is that if you're doing a query and using a group_by to filter results because some of the cells are causing duplication, then you have to question why the query is retrieving that information in the first place.

#14 in reply to: ↑ 12 ; follow-up: @filosofo
16 years ago

Replying to mrmist:

Now, it may be that that is your intended result, but it probably isn't.

It is intentional, because the thrown-away information in this situation is just the different meta_values.

If you're querying just by meta_key, then you don't care if there are multiple meta_values. If you're querying by both meta_key and meta_value, then there won't be multiple meta_values. So either way it doesn't matter.

#15 in reply to: ↑ 14 ; follow-up: @scribu
16 years ago

Replying to filosofo:

If you're querying just by meta_key, then you don't care if there are multiple meta_values. If you're querying by both meta_key and meta_value, then there won't be multiple meta_values. So either way it doesn't matter.

Ideally, you should exclude the meta_value column from the SELECT statement, but I'm not sure if it can be done.

#16 in reply to: ↑ 15 @mrmist
16 years ago

Replying to scribu:

Ideally, you should exclude the meta_value column from the SELECT statement, but I'm not sure if it can be done.

That was sort of my point. If meta_value is being fetched for no reason and discarded the query cost is going up. Best never to fetch it in the first place and not need the group_by.

Of course if the query can't be broken up that way then I guess it would have to do.

#17 follow-up: @filosofo
16 years ago

To keep things simple, I think removing the select for meta_value should be kept as a separate issue; I opened a ticket: #9177

#18 in reply to: ↑ 17 ; follow-up: @scribu
16 years ago

Replying to filosofo:

To keep things simple, I think removing the select for meta_value should be kept as a separate issue; I opened a ticket: #9177

Come to think of it, you don't need to SELECT any field from wp_postmeta when querying either by meta_key, or meta_value or both, right? Only then a GROUP BY wouldn't be necessary.

#19 in reply to: ↑ 18 ; follow-up: @filosofo
16 years ago

Replying to scribu:

Come to think of it, you don't need to SELECT any field from wp_postmeta when querying either by meta_key, or meta_value or both, right?

Right. That's why my patch at #9177 removes the select for the only wp_postmeta field currently selected: meta_value.

Only then a GROUP BY wouldn't be necessary.

The GROUP BY (or a DISTINCT) is still necessary in the multiple meta_key scenario because wp_postmeta is joined to wp_posts, no matter what is selected.

#20 in reply to: ↑ 19 @scribu
16 years ago

Replying to filosofo:

The GROUP BY (or a DISTINCT) is still necessary in the multiple meta_key scenario because wp_postmeta is joined to wp_posts, no matter what is selected.

Yes, you're right. (I've tested it on my localhost)

#21 @ryan
16 years ago

  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.