WordPress.org

Make WordPress Core

Opened 2 years ago

Last modified 21 months ago

#20487 new enhancement

Comment search isn't customizable

Reported by: brokentone Owned by:
Milestone: Future Release Priority: normal
Severity: normal Version: 3.2.1
Component: Comments Keywords: has-patch
Focuses: Cc:

Description

*cross-posted to the wp-hackers mailing list last night*

I work on a large news oriented WP site and first off, it actually scales remarkably well. We keep users, comments, and actual posts all in WordPress. However, due to our size, administering comments is causing us issues at the moment.

Our last outage was caused by one of our comment moderators doing a simple comment search in the admin. The query took 16 seconds to execute, the next query locked, and, with our traffic, the DB couldn't handle the resulting queued traffic.

Staying away from discussions of "you should use Disqus" or, "you need more X for your mysql server," I think there is significant opportunity to improve the way WP handles comment searching. Every comment search takes the following form:

SELECT * FROM wp_comments WHERE ( comment_approved = '0' OR comment_approved = '1' ) AND (comment_author LIKE '%TERM%' OR comment_author_email LIKE '%TERM%' OR comment_author_url LIKE '%TERM%' OR comment_author_IP LIKE '%TERM%' OR comment_content LIKE '%TERM%') ORDER BY comment_date_gmt DESC LIMIT #;

As we can all see, this is a beast of a query. Even when the term is clearly an specific term (say an email or IP), or when the intent of the admin is known (e.g. clicking the IP link on a specific comment). Furthermore, there are no hooks activated in this process for plugins to use to say create an advanced comment search plugin. One might expect hooks like those activated in post search to be activated here, like "parse_request" and "get_search_query."

Waxing philosophical, as posts, comments, and users are the three basic types of data to be stored and displayed in WP, one would expect them to have similar interfaces and functionality. They each have basic functionality of being able to retrieve a single item, a list (in full or in part), search for an element, or edit a single entry. For this reason I don't understand why each of their functionality isn't derived from something like an abstract class or an interface. The architecture here is also difficult in that you are not able to replace or extend a single class to change the functionality.

Back on target. I propose that we add functionality by which we'll be able to search by exact match in addition to the existing full wildcard (left, right, or full wildcard is probably excessive) as well as specifying the field to search. This would allow my earlier use case of searching by IP to look for an exact match in the "comment_author_IP" field only, not searching the fulltext of every comment.

This functionality can be created simply by:

  1. Adding "search_type" and "search_field" to the WP_Comment_Query::query_vars data structure in wp_includes/comment.php
  2. Replacing WP_Comment_Query::get_search_sql with something more robust, able to understand these new properties and construct the query on their basis
  3. Upgrading WP_Comments_List_Table::prepare_items to accept the new queries and add them to the data structure it creates
  4. Either modifying WP_List_Table::search_box to have some options of advanced search, or dropping a hook so that a plugin can easily modify it
  5. Modifying the WP_Comments_List_Table::column_author to supply the correct query string to indicate an IP search

If I can figure out a way to make this more similar to the signature of posts and users, or at least add some hooks at the right places, I can do that as well.

This will add efficiency for everyone--particularly those who have lots of comments. I modified our core for this functionality today and we will fully QA it tomorrow. I made it on 3.2.1 as that is what we're running right now, and there are minimal changes in the affected files and functions between 3.2.1 and 3.3.1 and even the nightly.

I can submit a patch as long as it passes our QA--how should I do that, off the nightly?

Thanks,
-Kenton Jacobsen

Attachments (4)

comment_search_20487_v1.diff (6.6 KB) - added by brokentone 2 years ago.
comment_search_20487_v2.diff (3.0 KB) - added by brokentone 2 years ago.
comment_search_20487_v3.diff (3.8 KB) - added by brokentone 2 years ago.
comment_search_20487_v4.diff (4.0 KB) - added by brokentone 2 years ago.

Download all attachments as: .zip

Change History (26)

comment:1 brokentone2 years ago

  • Version set to 3.2.1

comment:2 scribu2 years ago

  • Keywords needs-patch added

Waxing philosophical, as posts, comments, and users are the three basic types of data to be stored and displayed in WP, one would expect them to have similar interfaces and functionality. [...] For this reason I don't understand why each of their functionality isn't derived from something like an abstract class or an interface.

There was such an attempt: #15032

At this point, I think we'll have to settle with an abstraction layer on top of these existing classes.

there are no hooks activated in this process for plugins to use to say create an advanced comment search plugin. One might expect hooks like those activated in post search to be activated here, like "parse_request" and "get_search_query."

Actually, there are a few: 'pre_get_comments', 'comment_clauses' and 'get_comments'.

And I think adding one specifically for the search SQL should offer enough flexibility for most use-cases.

Last edited 2 years ago by scribu (previous) (diff)

comment:3 scribu2 years ago

  • Summary changed from Comment administration doesn't scale - Proposing significant change change to Comment search isn't customizable

To be clear, we are only interested in adding the filter to Core. The rest of the modifications you suggest are plugin territory.

I can submit a patch as long as it passes our QA--how should I do that, off the nightly?

See http://core.trac.wordpress.org/wiki#HowtoSubmitPatches

comment:4 brokentone2 years ago

  • Keywords has-patch added

@scribu Thanks for the quick response. Your work on #15032 is impressive. I'm sad that didn't make it into core. If you ever want to revisit it, I would be interested in helping.

I was a little unclear, there are no actions/filters between the creation and execution of the mysql query. I'm looking into how it is setup in story searching and I'll create similar behavior here.

In the meantime I diffed my changes for review. I understand your argument against them, and if I'm only able to get the filter to core, I'll indeed be able to do what I need as a plugin, but I still propose at least the underlying logic still be included in core for the following reasons:

  1. This is a scalability problem. It *works*, albeit inefficiently, on small sites, but by the time you have millions of comments (yes, multiple millions) it does not. 16 second queries are impossible to make work. I understand WP is attempting to be a serious player in large-scale websites, to not address this issue in core would be to ignore large-scale players' needs.
  2. The increased functionality adds little complexity.
  3. Current functionality is broken. When an admin clicks on the IP address of a comment, the logical outcome would be to do an exact search on only the comment_author_IP field. The expectation would not be to look for the IP as a string within author emails, comment text, etc. Having the functionality in the objects can allow this logical outcome even without an advanced search form.
  4. It will be difficult to design a plugin to have a similar UX without greater core support. I would have to create a separate page to host the advanced search form, then I believe I would have to direct the action of that form to the existing comment display page, where the current search form will be appear as there is no hook there to remove or modify it.

I appreciate your consideration,
-Kenton Jacobsen
theblaze.com

comment:5 scribu2 years ago

It will be difficult to design a plugin to have a similar UX without greater core support. I would have to create a separate page to host the advanced search form

You can modify it via JavaScript.

When an admin clicks on the IP address of a comment, the logical outcome would be to do an exact search on only the comment_author_IP field. The expectation would not be to look for the IP as a string within author emails, comment text, etc. Having the functionality in the objects can allow this logical outcome even without an advanced search form.

WP_User_Query restricts what fields are searched based on the search string:

$search_columns = array();
if ( $qv['search_columns'] )
	$search_columns = array_intersect( $qv['search_columns'], array( 'ID', 'user_login', 'user_email', 'user_url', 'user_nicename' ) );
if ( ! $search_columns ) {
	if ( false !== strpos( $search, '@') )
		$search_columns = array('user_email');
	elseif ( is_numeric($search) )
		$search_columns = array('user_login', 'ID');
	elseif ( preg_match('|^https?://|', $search) && ! wp_is_large_network( 'users' ) )
		$search_columns = array('user_url');
	else
		$search_columns = array('user_login', 'user_nicename');
}

Maybe we can do something similar here.

comment:6 scribu2 years ago

So, maybe adding 'search_columns' and 'search_fields' to WP_Comment_Query makes sense.

As for the UI, consider that there is no advanced search UI even for posts. Having it for comments is exceedingly unlikely.

comment:7 brokentone2 years ago

I don't think that logic is a reasonable solution. Users don't have freeform text areas to worry about, whereas comments do. Any logic to automagically choose the correct field will limit the ability to search the content, in a non-intuitive way. You want the ability to search for links, IP addresses or emails within the comment text field.

Without a sizable portion of my structural and functional modifications there will be no elegant way to make this work, even as a plugin.

Let's take the UI off the table, I accept your logic, and I appreciate the JS suggestion.

comment:8 scribu2 years ago

Well, technically, users have the 'bio' field, but I see your point.

comment:9 brokentone2 years ago

What are the concerns with what I'm proposing? Are there a significant functional, performance, or security implications? I would be happy to evaluate and refactor this code based on specific concerns.

comment:10 scribu2 years ago

Well, for one thing, we should have a whitelist of searchable columns in get_search_sql() and you should pass $this->query_vars['search_field'] into get_search_sql() as the second parameter.

comment:11 scribu2 years ago

Also, use tabs for indentation, not spaces: http://codex.wordpress.org/WordPress_Coding_Standards

comment:12 brokentone2 years ago

There already is a list of allowed columns supplied as the second argument to get_search_sql, which I am checking against. If it fails, it executes a normal search. They are appended together for the normal search behavior. Passed both search_field and search_type as parameters to the method now rather than using the class reference, fixed my spaces vs. tabs, and removed the UI.

Please take another look.

comment:13 scribu2 years ago

The thing is that get_search_sql() was generic at some point. It's not anymore, so it doesn't make sense to pass the whitelist as a parameter.

'search_field' should be called 'search_columns', to match WP_User_Query.

Last edited 2 years ago by scribu (previous) (diff)

comment:14 brokentone2 years ago

I can change the name, that's fine. I understand it may have worked differently in the past, but it works this way now. I can hardcode another array that is equal to the array currently passed to the method, but what would that gain us? This would reduce the flexibility of the code and I see no upside.

comment:15 scribu2 years ago

The gain is not having to pass the whitelist each time you call get_search_sql().

If you don't want to hardcode the whitelist into get_search_sql(), you might as well do:

get_search_sql( array_intersect( $whitelist, $this->query_vars['search_columns'] ) );

comment:16 brokentone2 years ago

I see where you're going with this... revising the way I'm doing this, making is considerably simpler. I'll submit the code tonight/tomorrow morning.

comment:17 brokentone2 years ago

Applied the wildcard logic and language from WP_User_Query as well as the columns language and logic. This simplified it significantly. I think this is looking pretty good.

comment:18 scribu2 years ago

  • Keywords needs-patch removed

Yep, looks pretty good.

Only some minor spacing issues. For example:

if($search_columns) $args['search_columns'] = (array)$search_columns;

should be

if ( $search_columns ) $args['search_columns'] = (array) $search_columns;

etc.

comment:19 brokentone2 years ago

Got it, I was more worried about the concept than the style. I'm also testing the hook right now. I'll get another version up tonight with the hook and I'll take another look at the coding standards.

comment:20 brokentone2 years ago

Added the hook. I chose pre_comment_query as that just seemed to make sense. Although due to the difference in construction whereas the equivalent user action passes an WP_Query object, this passes the query as a string. Not sure how big of a deal that is, with the inconsistency here and the lack of a common abstract class between these different features, I'm really don't expect it to be a concern.

I cleaned up a number of formatting issues within my code and immediately outside my code. What's our next step?

comment:21 scribu2 years ago

  • Milestone changed from Awaiting Review to Future Release

The next step is to hope that it gets commited in the WP 3.5 cycle.

comment:22 ryan21 months ago

For user and site searches in the network admin, we require explicitly placing globs in the search string to invoke trailing and leading wildcard searches and disallow using both trailing and leading when there are many sites/users to search. Maybe something to consider here.

Note: See TracTickets for help on using tickets.