Make WordPress Core

Opened 10 years ago

Closed 9 years ago

Last modified 9 years ago

#8022 closed defect (bug) (invalid)

non-DISTINCT query in get_archives for postbypost

Reported by: kevinB Owned by:
Milestone: Priority: normal
Severity: normal Version: 2.7
Component: Template Keywords: archives, postbypost, filter
Focuses: Cc:


The get_archives db query for the postbypost type is not cleanly filterable because it lacks a DISTINCT clause.

Attachments (2)

postbypost_distinct.patch (646 bytes) - added by kevinB 10 years ago.
add DISTINCT clause to query for get_archives postbypost
postbypost_filters.patch (781 bytes) - added by kevinB 9 years ago.
getarchives_distinct, getarchives_fields filters for postbypost archive type

Download all attachments as: .zip

Change History (13)

10 years ago

add DISTINCT clause to query for get_archives postbypost

#1 @mrmist
10 years ago

You'll have to excuse my lack of knowing what you mean by "filterable".

However, in SQL terms adding a distinct to this query is redundant. The query is doing a SELECT * on a table that has an auto increment column, so all your results are guaranteed distinctness unless your database is seriously broken, by virtue of the ID column. The DISTINCT will just impose overhead.

#2 @jacobsantos
10 years ago

  • Milestone 2.7 deleted
  • Resolution set to invalid
  • Status changed from new to closed

I agree with mrmist, Distinct isn't guaranteed to do anything except add overhead. If you want to add a filter to the sql, then that would be okay. Should be in a new (better worded) ticket.

#3 @kevinB
9 years ago

  • Resolution invalid deleted
  • Status changed from closed to reopened
  • Version changed from 2.7 to 2.8

By "filterable" I mean able to use the two filters which are already applied in the wp_get_archives() function.

The getarchives_join filter is unusable with a postbypost comments listing because joining another table to a SELECT * query can cause redundant result rows. In practice, to implement my desired join I would have to run a followup 'query' filter to change the queried fields from * to DISTINCT $wpdb->comments.* Instead I have just instructed any affected users to hack their general-template.php as in the attached patch.

The problem of redundant results from joined data (which serves a filtering purpose but may not be intended for display) is why the main WP_Query::get_posts query includes a posts_distinct (and posts_fields) filter. I thought two new filters just for postbypost comments would be perceived as design overkill, but if you prefer to go that way that's fine.

I will let you re-close this ticket and/or start another now that I've explained myself. When I get time I'll check back and open another ticket if the problem is not yet resolved.

#4 @kevinB
9 years ago

Correction in my last comment: desired field clause is $wpdb->posts.*

Please acknowledge my identification of a defect in wp_get_archives() even if you don't like the current patch.

#5 @mrmist
9 years ago

You still can't affect the results of a SELECT * on this table by adding a DISTINCT to it. The results of SELECT * FROM posts are always distinct, because there's a unique column in the result set. If you want to DISTINCT the results for some other reason (e.g. DISTINCTing the result set of a SELECT which is joining to this result set) then you need to alter that query, not this one, and that query would have to be specific about what columns it returns.

I'll not close this again, I leave that to someone else, but the existing patch is invalid.

#6 @thee17
9 years ago

  • Milestone set to 2.8

9 years ago

getarchives_distinct, getarchives_fields filters for postbypost archive type

#7 @kevinB
9 years ago

  • Keywords archives postbypost filter added
  • Priority changed from low to normal
  • Version changed from 2.8 to 2.7

mist, I'm just now checking back in and see that we haven't made any progress in understanding each other. What you said is true for SELECT DISTINCT *, but the last patch I posted was a SELECT DISTINCT $wpdb->posts.*

Sometimes there are reasons to use LEFT JOINs, and those can lead to duplicate post names/IDs in the result set.

From your previous suggestion, I think we can agree on the validity of this new patch's API even if we don't agree on the validity of SQL statements it might facilitate.

The separate filters getarchives_distinct and getarchives_fields follow the precedent of posts_distinct and posts_fields. The default-merged args array is passed as a 2nd argument. I'm not applying these filters to the monthly/weekly/etc. archive queries, but future archive types could apply them as needed.

#8 @mrmist
9 years ago

Hi. I see what you are getting at and since it would be dynamic whether to use the distinct or not I suppose it does not harm in the end.

Perhaps you can given an example of a query the DISTINCT helps with though as I am having trouble seeing what you are wanting to get out of the query.

For example this -

SELECT w.* FROM wp_posts w left join wp_comments c on w.ID = c.comment_post_ID;

Returns duplicates, as you would expect.

This -

SELECT DISTINCT w.* FROM wp_posts w left join wp_comments c on w.ID = c.comment_post_ID;

Returns one row for every row in wp_posts, as you would expect. However, it's completely useless for listing comments. For that you need (at least)

SELECT DISTINCT w.*, comment_content FROM wp_posts w left join wp_comments c on w.ID = c.comment_post_ID;

And (assuming that you're not wanting to skip duplicate comment content) that's just the same as

SELECT w.*, comment_content FROM wp_posts w left join wp_comments c on w.ID = c.comment_post_ID;

Which will give you the same number of rows as

SELECT * FROM wp_posts w left join wp_comments c on w.ID = c.comment_post_ID;

I'm not deliberately trying to be awkward, I'm just trying to ascertain what you are using the DISTINCT for, and whether there is some other way of doing what you are trying to do.

#9 @kevinB
9 years ago

I'm sorry I dragged "comments" into this conversation by accidentally using that word in a previous entry. My concern here is strictly with the listing of posts by the function wp_get_archives, and the possibility of filtering that list by post category or other one-to-many relationship.

I labeled this as a defect rather than feature request since the getarchives_join filter is already applied to the postbypost archive type, but joining on categories (a likely usage) causes duplicate entries in the end user's archives listing if a post is in more than one joined category.

I maintain a plugin (http://agapetry.net/news/introducing-role-scoper/) which imposes content-specific permissions. I do this via JOIN and IN clauses on the term_relationships table (for category-specific roles) and also on my own user2role2object table (for post/page-specific roles).

For the most part, WordPress provides all the hooks I need to pull this off. For wp_get_archives, I have the getarchives_join and getarchives_where filters. For the same reasons you cited above, my JOIN clauses would generate duplicate archive listings if the post is in more than one user-readable category. Existing DISTINCT clauses in the monthly, yearly, weekly and daily archive types prevent duplicates. However, the query for postbypost archive uses a "SELECT *" query. I want to filter that to "SELECT DISTINCT $wpdb->posts.*"

#10 @Denis-de-Bernardy
9 years ago

  • Milestone 2.8 deleted
  • Resolution set to invalid
  • Status changed from reopened to closed

this is fixed in trunk, where group by clauses are used

#11 @Denis-de-Bernardy
9 years ago

please re-open a separate, feature-request ticket for the filters, else it'll get missed in the thread (I only noticed that part when I was about to close the window).

Note: See TracTickets for help on using tickets.