Make WordPress Core

Opened 12 years ago

Closed 8 years ago

Last modified 8 years ago

#22744 closed enhancement (fixed)

Media search doesn't include file name

Reported by: johnbillion's profile johnbillion Owned by: joemcgill's profile joemcgill
Milestone: 4.7 Priority: normal
Severity: normal Version: 3.0
Component: Media Keywords: has-patch has-unit-tests commit
Focuses: administration Cc:

Description

Upload a file called foo.png and then do a search for foo.png (either on the Media screen or in the new media modal) and it won't find the file.

Probably existed since forever. Marking as 3.0.

Attachments (14)

22744.diff (1.6 KB) - added by wonderboymusic 10 years ago.
22744.2.diff (1.7 KB) - added by DrewAPicture 10 years ago.
docs
22744.3.diff (1.8 KB) - added by DrewAPicture 10 years ago.
22744.4.diff (2.1 KB) - added by joemcgill 8 years ago.
22744.5.diff (3.2 KB) - added by swissspidy 8 years ago.
22744.6.diff (4.2 KB) - added by joemcgill 8 years ago.
Extend unit tests
22744.7.diff (3.9 KB) - added by joemcgill 8 years ago.
Use DISTINCT query
22744.8.diff (4.6 KB) - added by joemcgill 8 years ago.
22744.9.diff (4.5 KB) - added by joemcgill 8 years ago.
22744.10.diff (5.5 KB) - added by joemcgill 8 years ago.
22744.11.diff (5.5 KB) - added by joemcgill 8 years ago.
22744.12.diff (3.3 KB) - added by joemcgill 8 years ago.
22744.13.diff (4.2 KB) - added by joemcgill 8 years ago.
Adds unit test for tax_query
22744.14.diff (4.6 KB) - added by flixos90 8 years ago.

Download all attachments as: .zip

Change History (83)

#1 @azaozz
12 years ago

  • Milestone changed from Awaiting Review to Future Release

Yes, we've never searched on file name only on attachment post title/post content. Would be a nice enhancement for the media search.

As the file name is used for post title, many users get confused why they cannot find previous media files if they changed the title.

#2 @valeriyan
12 years ago

Consider this scenario for support of this change:

  1. User uploads a bunch of files to Media Library. One such file is white.png
  2. User changes the title of file white.png to Rainbows
  3. Months later, user wants to see whether a file on his computer (the same white.png) has already been uploaded to the Media Library
  4. Since user has hundreds of files in the Media Library by now, he decides to search for the file by name using "white"
  5. User does not find white.png and cannot remember the title that he gave to the image (Rainbows), and so he needs to re-upload the file instead of being able to use the same one again

#3 @ericlewis
10 years ago

  • Focuses javascript added
  • Keywords needs-patch added

#4 @helen
10 years ago

  • Milestone changed from Future Release to 4.0

Please. :)

#5 @nofearinc
10 years ago

Currently the title of an attachment doesn't include the file type/extension. I can see several different directions:

  1. Changing the attachment title to the full title, including the extension (would probably affect the user experience)
  2. Adding the full name as a post_content or a postmeta (could alter some other post_content going and will add extra information in the meta table probably)
  3. Comparing against the post_mime_type and a list of available extensions there

This ticket was mentioned in IRC in #wordpress-dev by jphase. View the logs.


10 years ago

#7 @wonderboymusic
10 years ago

  • Focuses docs administration added; javascript removed
  • Keywords has-patch needs-docs added; needs-patch removed

Hold your nose while perusing 22744.diff, but I got this to work.

#8 @DrewAPicture
10 years ago

The descriptions all need periods, and we should mark these @access private. Also, _wp_filter_attachment_meta_sql() is definitely pretty cryptic, so maybe add some inline comments to explain what's going on.

@DrewAPicture
10 years ago

docs

#9 @DrewAPicture
10 years ago

  • Owner set to wonderboymusic
  • Status changed from new to reviewing

22744.2.diff addresses the docs feedback in comment:8. Still need an explanatory comment or comments deciphering the logic in _wp_filter_attachment_meta_sql().

#10 @DrewAPicture
10 years ago

  • Focuses docs removed

#11 @helen
10 years ago

  • Keywords dev-feedback added
  • Type changed from enhancement to task (blessed)

#12 @michalzuber
10 years ago

This "visual" output change might help write docs

  /**
   * $sql = array(
   *     'join' =>  INNER JOIN wp_postmeta ON (wp_posts.ID = wp_postmeta.post_id)
   *     'where' => AND ( (CAST(wp_postmeta.meta_value AS CHAR) LIKE '%code%') )
   *                AND ( wp_postmeta.meta_key = '_wp_attached_file' )
   * )
   **/
  if ( 0 === strpos( $sql['where'], ' AND (' ) ) {
      list( $_, $clauses ) = explode( ' AND ', $sql['where'], 2 );
      $sql['where'] = " OR ({$clauses})";
  }
  /**
   * $sql = array(
   *     'join' =>  INNER JOIN wp_postmeta ON (wp_posts.ID = wp_postmeta.post_id)
   *     'where' => OR (
   *                  ( (CAST(wp_postmeta.meta_value AS CHAR) LIKE '%code%') )
   *                  AND ( wp_postmeta.meta_key = '_wp_attached_file' )
   *                )
   * )
   **/
Last edited 10 years ago by michalzuber (previous) (diff)

This ticket was mentioned in IRC in #wordpress-dev by ocean90. View the logs.


10 years ago

#14 @DrewAPicture
10 years ago

  • Keywords needs-docs removed

Docs added for the meta sql filter in 22744.3.diff. Forgot to post a comment the other day.

#15 @michalzuber
10 years ago

  • Keywords needs-refresh added

Nice DrewAPicture. In Media List view patch fixed the search, but in Media Grid it still needs fixing https://youtu.be/40hsY2DuXus

Last edited 10 years ago by michalzuber (previous) (diff)

This ticket was mentioned in IRC in #wordpress-dev by DrewAPicture. View the logs.


10 years ago

#17 @ocean90
10 years ago

  • Milestone changed from 4.0 to Future Release
  • Type changed from task (blessed) to enhancement

Let's try this again in 4.1. Maybe with some unit tests too.

#18 @wonderboymusic
10 years ago

  • Keywords needs-patch added; has-patch dev-feedback needs-refresh removed

We can probably use a nested meta query now

#19 @SergeyBiryukov
9 years ago

#32149 was marked as a duplicate.

This ticket was mentioned in Slack in #core by helen. View the logs.


8 years ago

#21 @joemcgill
8 years ago

  • Milestone changed from Future Release to 4.7

Let's take a look at this for 4.7. @wonderboymusic feel free to assign to reassign to me if you don't want it.

This ticket was mentioned in Slack in #core by helen. View the logs.


8 years ago

#23 @Kent Brockman
8 years ago

This feature is actually available when you install the plugin Media Search Enhanced. It works out of the box by simply activating the plugin: https://es.wordpress.org/plugins/media-search-enhanced/

It's really useful for power users dealing with huge sites containing thousands of media files that need to be reused in several/future posts. Also for giant WooCommerce sites where you may need to re-use a picture or banner or whatever file you are sure it was uploaded some years ago but can't recall its description but you have a hint of its filename.

You could check its code and merge it into core.

This ticket was mentioned in Slack in #core-images by joemcgill. View the logs.


8 years ago

This ticket was mentioned in Slack in #core-images by joemcgill. View the logs.


8 years ago

This ticket was mentioned in Slack in #core-images by joemcgill. View the logs.


8 years ago

@joemcgill
8 years ago

#27 follow-up: @joemcgill
8 years ago

  • Keywords has-patch needs-testing needs-unit-tests added; needs-patch removed

22744.4.diff Accomplishes this task without introducing any new filters. To do so, any time we execute a search query on attachments we modify the JOIN clause to include postmeta, the WHERE clause to include searches for postmeta where _wp_attached_file is the meta_key and the GROUP BY clause to avoid returning duplicate posts.

One question is whether we should search attachment file names any time attachment post types are included in a search query, e.g. $q['post_type'] => array('attachment', 'post'), or only when exclusively doing searches on attachments? Currently, I'm only doing the latter.

Also, thanks @Kent Brockman for the suggestion to check out the Media Search Enhanced plugin. Doing so helped me get my head around what needed to happen here.

@swissspidy
8 years ago

#28 in reply to: ↑ 27 @swissspidy
8 years ago

  • Keywords has-unit-tests added; needs-unit-tests removed

Replying to joemcgill:

22744.4.diff Accomplishes this task without introducing any new filters. To do so, any time we execute a search query on attachments we modify the JOIN clause to include postmeta, the WHERE clause to include searches for postmeta where _wp_attached_file is the meta_key and the GROUP BY clause to avoid returning duplicate posts.

Thanks for the refreshed patch! In 22744.5.diff I made it work with post_type being an array too. Added a test to demonstrate it.

One question is whether we should search attachment file names any time attachment post types are included in a search query, e.g. $q['post_type'] => array('attachment', 'post'), or only when exclusively doing searches on attachments? Currently, I'm only doing the latter.

Since this ticket is mainly about the media screen, I think fixing it only there for now is fine. It can be considered in #27266, even though that ticket is only slightly connected.

@joemcgill
8 years ago

Extend unit tests

#29 @joemcgill
8 years ago

Thanks @swissspidy. In 22744.6.diff I modified your unit test in 22744.5.diff to also test cases where the post type is passed as a string instead of an array. I also added a test to demonstrate the current behavior of not including searches for attachment filenames in general searches.

I tend to agree that fixing this exclusively for attachment searches is probably fine for this ticket. I'm curious if including the meta value search here will have any significant performance implications for sites containing a large post meta table—something that should also be considered if we decide to change behavior in #27266.

This ticket was mentioned in Slack in #core-images by joemcgill. View the logs.


8 years ago

@joemcgill
8 years ago

Use DISTINCT query

#31 follow-up: @joemcgill
8 years ago

Some preliminary testing on a site with ~14k attachments and ~18k posts, this change increased the main query time from 60.7ms to 187.6ms, which may be acceptable since this is only applied in an admin context, but testing on installs with a larger number of posts/attachments would be appreciated.

For the sake of testing, 22744.7.diff uses a DISTINCT query instead of GROUP BY, which (as I expected) didn't have any measurable performance improvement in my tests but may be worth considering.

#32 @swissspidy
8 years ago

Thanks for the updated patch & tests! I'd probably split test_s_should_include_file_names_in_attachment_searces into 2 tests to have 1 assertion per test method.

The increase in main query time is a good argument against adding this to the front-end as well. Not to mention that visitors likely don't know or care about file names anyway.

#33 @Kent Brockman
8 years ago

@swissspidy:
May the search for filenames become a setting to turn on/off in the Media screen (defaulting to Off to avoid a support rush)? FYI: I have the Media Search Enhanced plugin installed in 30/80 websites. Not bad rate, and the most of them are ecommerce sites. Maybe WooCommerce should incorporate this as a feature instead? :)
In the other hand, no overhead or slow down was noticed by using the feature plugin on sites with nearly 6000 images uploaded. I would expect an even better performance if it is part of the core; is this assumption correct?

@joemcgill
8 years ago

#34 @joemcgill
8 years ago

22744.8.diff returns to the GROUP BY clause approach from 22744.6.diff and splits the assertions in the first test into two separate unit test.

@swissspidy I agree that the performance cost, however small, is a knock against adding this to the front end. I even wonder if we should avoid adding this to the back end as well and let plugins handle what I imagine is a fairly limited use case, if we're going to apply the Design for the Majority principle here.

@Kent Brockman if we do modify Core's behavior here, we would most likely not add a new setting in the admin to modify this behavior. It would have to be overridden programmatically with a filter, as current plugins are doing to add this functionality now.

Version 0, edited 8 years ago by joemcgill (next)

#35 @swissspidy
8 years ago

Based on the previous comments on this ticket and the users' desire to improve media management, we should focus on adding this to the back-end only.
With that in mind, 22744.8.diff will likely cause unexpected behaviour for people running attachment queries elsewhere. Adding a meta query in wp_ajax_query_attachments() would isolate this more and be way cleaner than the manual JOIN in WP_Query.

Unfortunately, meta queries cannot easily be made optional yet, hence the need for _wp_filter_attachment_meta_sql in 22744.3.diff.

@joemcgill
8 years ago

#36 @joemcgill
8 years ago

Good points @swissspidy, we can apply this more narrowly.

22744.9.diff only applies the changes from 22744.6.diff to cases where we're calling wp_ajax_query_attachments() via an AJAX request or to times where we're populating a WP_Media_List_Table object by applying a new private filter, _filter_query_attachment_filenames() to the posts_clauses hook in WP_Query::get_posts().

Even if we were able to make the meta_query optional here, we wouldn't be generating the exact SQL that we need, because WP_QUERY:parse_search() splits the $q['s'] terms into separate WHERE clauses.

If this approach seems viable, the tests will need to be updated.

This ticket was mentioned in Slack in #core by joemcgill. View the logs.


8 years ago

@joemcgill
8 years ago

#38 @joemcgill
8 years ago

  • Owner changed from wonderboymusic to joemcgill
  • Status changed from reviewing to accepted

After our discussion in core dev chat this week, the consensus seems to be that we should only apply this in the media library (both in the modal and in the media library in list view via WP_Media_List_Table.

22744.10.diff updates the tests after the changes suggested in 22744.9.diff. Since this behavior is applied via placement of a filter, they're really more like a set of reduced case integration tests that ensure WP_Query is including filenames when applied, and not when the filter is not applied.

This ticket was mentioned in Slack in #core-images by mike. View the logs.


8 years ago

This ticket was mentioned in Slack in #core-images by paaljoachim. View the logs.


8 years ago

This ticket was mentioned in Slack in #core-images by joemcgill. View the logs.


8 years ago

This ticket was mentioned in Slack in #core by joemcgill. View the logs.


8 years ago

#43 @swissspidy
8 years ago

I couldn't find a really large site to test this on, but it works well on regular / medium-sized sites.

Regarding the patch, some notes:

  • @since 4.7.0 instead of @since 4.7
  • Doc block for test_include_file_names_in_attachment_search_as_string() needs to be fixed.
  • Every added filter in the tests should be removed again after the query, just to be on the safe side.

After fixing that, let's ship it!

@joemcgill
8 years ago

#44 @joemcgill
8 years ago

  • Keywords commit added; needs-testing removed

Thanks @swissspidy – patch updated to address those two issues in 22744.11.diff

This ticket was mentioned in Slack in #core-images by joemcgill. View the logs.


8 years ago

#46 @joemcgill
8 years ago

  • Resolution set to fixed
  • Status changed from accepted to closed

In 38625:

Media: Make media library searchable by filename.

This applies a new private function, _filter_query_attachment_filenames(),
to the post_clauses filter hook during wp_ajax_query_attachments() and
wp_edit_attachments_query_vars() to include _wp_attached_file post meta
in search queries performed from the media library or in a WP_Media_List_Table.

Props wonderboymusic, DrewAPicture, joemcgill, swissspidy.
Fixes #22744.

This ticket was mentioned in Slack in #core by joemcgill. View the logs.


8 years ago

This ticket was mentioned in Slack in #core-images by joemcgill. View the logs.


8 years ago

#49 @flixos90
8 years ago

  • Keywords commit removed
  • Resolution fixed deleted
  • Status changed from closed to reopened

I noticed a bug introduced by the above changeset: The JOIN clause of the query is completely overridden by _filter_query_attachment_filenames(), regardless of whether other joins were already included. This can break several queries. We need to make sure that original joins are preserved.

I stumbled across this while querying attachment taxonomies. :)

#50 @kylegilman
8 years ago

@flixos90 I second that. I was having a problem with a meta_query in the media library and I tracked it back to _filter_query_attachment_filenames(). Any meta_query in the media library using NOT EXISTS for compare leads to an empty result. With _filter_query_attachment_filenames() disabled everything works fine.

#51 @joemcgill
8 years ago

@flixos90 good catch. I can see where JOINs are getting blown away. We'll need to get some unit tests on this.

@kylegilman can you give an example of how to reproduce the issue you're seeing?

#52 @kylegilman
8 years ago

I have a plugin that generates a number of attachments that usually don't need to show up in the Media Library list. I filter the Media Library query by only displaying items that don't have a particular custom meta key. It works in 4.6 and in 4.7 as long as _filter_query_attachment_filenames() is disabled.

This is a more general action than I use, but simplified to reproduce the issue:

add_action('pre_get_posts', 'kg_hide_based_on_meta_key');

function kg_hide_based_on_meta_key( $query ) {          

        $query->set(
                'meta_query',
                array(
                        array(
                                'key' => '_anymetakey',
                                'compare' => 'NOT EXISTS'
                        )
                )
        );

}

shouldn't change the results at all assuming _anymetakey is not in use, but it will result in an empty list in the Media Library. The list of Posts or any other query work fine as far as I can tell. If you change the key to a value we know an attachment will have and remove compare then the full list is shown.

add_action('pre_get_posts', 'kg_hide_based_on_meta_key');

function kg_hide_based_on_meta_key( $query ) {          

        $query->set(
                'meta_query',
                array(
                        array(
                                'key' => '_wp_attachment_metadata'
                        )
                )
        );

}

EDIT: It's worth pointing out that this last one breaks all non-attachment queries and is just an example.

Last edited 8 years ago by kylegilman (previous) (diff)

This ticket was mentioned in Slack in #core by joemcgill. View the logs.


8 years ago

@joemcgill
8 years ago

#54 @joemcgill
8 years ago

  • Keywords needs-testing added

22744.12.diff Maintains existing JOIN clauses by adding an additional LEFT JOIN of the postmeta table rather than replacing the whole JOIN clause. Additionally, during debugging I noticed that I was previously applying the _filter_query_attachment_filenames() filter on all queries from the media library, and not just search queries, so I've fixed that as well.

This also adds a unit test which includes a meta query to make sure we're not trampling that JOIN.

This ticket was mentioned in Slack in #core by joemcgill. View the logs.


8 years ago

This ticket was mentioned in Slack in #core-images by joemcgill. View the logs.


8 years ago

#57 @swissspidy
8 years ago

Regarding the latest patch: every added filter in the tests should be removed again after the query, just to be on the safe side. Otherwise it could theoretically affect other tests.

#58 @flixos90
8 years ago

@swissspidy I'm not sure if we should do that. We can to be a 100 percent safe, but the filter removes itself automatically when it's run. We might actually add a unit test to verify that the filter removes itself automatically.

Otherwise 22744.12.diff looks good to me. The one thing that might cause issues is that we completely override the groupby clause as this might affect some custom logic which specifies it. We might wanna add another unit test with a tax_query, just to be on the safe side (I tested it manually and it works though).

@joemcgill On Slack you asked about possible improvements for testing the full integration path. By that do you mean we should test for example whether wp_edit_attachments_query_vars() and wp_ajax_query_attachments() add the filter properly and return proper results? We could do that, although I'm not sure whether this would be very helpful. Other than testing these two functions, I don't currently see a way to test this in a "larger scope".

Last edited 8 years ago by flixos90 (previous) (diff)

#59 @boonebgorges
8 years ago

The approach in 22744.12.diff looks good to me.

#60 @swissspidy
8 years ago

@flixos90 You're right, I missed that.

@joemcgill
8 years ago

Adds unit test for tax_query

#61 @joemcgill
8 years ago

  • Keywords commit added; needs-testing removed

Other than testing these two functions, I don't currently see a way to test this in a "larger scope".

@flixos90: you're correct, I was curious about testing to make sure the filters are being applied at the right times, but I can't really think of a clean way to do that. In the mean time, I've added another unit test in 22744.13.diff that ensures things are working when a tax_query in included, as you suggested. Let me know if you notice anything else. Otherwise, I plan to commit this early this week.

@flixos90
8 years ago

#62 @flixos90
8 years ago

@joemcgill Looks good to me as well. In 22744.14.diff I added another unit test to ensure that the filter function unhooks itself after being called, as I mentioned above.

#63 @joemcgill
8 years ago

  • Resolution set to fixed
  • Status changed from reopened to closed

In 38733:

Media: Better handling of JOINs when searching filenames.

Following [38625], any media searches that already included JOINs,
e.g., tax_queries, would get trampled when we joined the post meta
table to search for filenames. This preserves existing JOINs and
also only applies the _filter_query_attachment_filenames() filter
when a search query is being performed.

Props flixos90, joemcgill.
Fixes #22744.

This ticket was mentioned in Slack in #core by joemcgill. View the logs.


8 years ago

This ticket was mentioned in Slack in #core-media by joemcgill. View the logs.


8 years ago

#66 in reply to: ↑ 31 ; follow-up: @sashman13
8 years ago

Replying to joemcgill:

Some preliminary testing on a site with ~14k attachments and ~18k posts, this change increased the main query time from 60.7ms to 187.6ms, which may be acceptable since this is only applied in an admin context, but testing on installs with a larger number of posts/attachments would be appreciated.

This change is proving very problematic on one of our larger client sites. They have 886k posts and close to 13 million postmeta entries with well over 2 million attachments. The query generated by this increased their media search time from 10 seconds to 25 seconds and worse yet, is locking the postmeta table in the MYISAM engine for the entirety of the query due to the outer join and tmp table. The previous search lock was insignificant.

#67 in reply to: ↑ 66 ; follow-up: @joemcgill
8 years ago

Replying to sashman13:

Thanks for the feedback, @sashman13. See #39358 for discussion about how to disable this feature on larger sites.

#68 in reply to: ↑ 67 @sashman13
8 years ago

Appreciate your reply and all your hard work.

#69 @presjpolk
8 years ago

Hi,

I work on the site @sashman13 was talking about. We were able to apply the code from #39358 as a plugin to bring down those search times to something that's not killing us anymore. That was great, so again thank you to @joemcgill for the help on that.

thanks,

Note: See TracTickets for help on using tickets.