Make WordPress Core

Opened 4 years ago

Last modified 2 years ago

#51769 new enhancement

Full-text search support

Reported by: zieladam's profile zieladam Owned by:
Milestone: Awaiting Review Priority: normal
Severity: normal Version:
Component: Database Keywords: dev-feedback 2nd-opinion
Focuses: Cc:

Description (last modified by zieladam)

Let's talk about first-class fulltext search support in WP core. Is there any reason not to support FULLTEXT search in WordPress core for newer MySQL versions with the current search mechanism serving as a fallback for older MySQL versions?

MySQL supports FULLTEXT indexes for MyISAM tables since version 3.23.23 and for InnoDB tables since 5.6.

As @TimothyBlynJacobs noted, WordPress supports versions back to MySQL 5.0:

https://github.com/WordPress/wordpress-develop/blob/master/src/wp-includes/version.php
src/wp-includes/version.php

As @desrosj noted:

  • The recommended versions are MySQL >= 5.6/MariaDB >= 10.1 as per https://wordpress.org/download/.
  • According to the WordPress stats page, a considerable amount of sites use MySQL 5.5 (35%), and some still use 5.1/5,0. Raising support to >= 5.6 is not really an option.

The last discussion about that feature I found was from 10 years ago so maybe it's good time to re-evaluate:

https://core.trac.wordpress.org/ticket/14722

Change History (16)

#1 @zieladam
4 years ago

  • Description modified (diff)

This ticket was mentioned in Slack in #core by zieladam. View the logs.


4 years ago

#3 @johnbillion
4 years ago

  • Component changed from General to Database
  • Keywords dev-feedback added

#4 @zieladam
4 years ago

  • Description modified (diff)

#5 @zieladam
4 years ago

  • Description modified (diff)

This ticket was mentioned in PR #770 on WordPress/wordpress-develop by adamziel.


4 years ago
#6

  • Keywords has-patch added

This PR proposes core support for fulltext search with fallback to LIKE when fulltext search isn't available.

Sites that would like to keep using LIKE could disable the fulltext_search_enabled option:

<img width="768" alt="Zrzut ekranu 2020-11-26 o 17 12 46" src="https://user-images.githubusercontent.com/205419/100373190-ae7a7800-300a-11eb-9429-bd4545ee7b82.png">

Trac ticket: https://core.trac.wordpress.org/ticket/51769

#7 @zieladam
4 years ago

I came up with a very simple PR that attempts to create FULLTEXT indexes and use them for search later on. If ALTER TABLE ADD FULLTEXT INDEX fails, it simply keeps using the existing LIKE-based code. I would love to get some feedback and discussion here :-) CC @noisysocks @azaozz @TimothyBlynJacobs @andraganescu @desrosj

This ticket was mentioned in Slack in #core by zieladam. View the logs.


4 years ago

#9 @zieladam
4 years ago

@azaozz raised a great concern about the performance of these queries:

<?php
$wpdb->query( "ALTER TABLE $wpdb->posts ADD FULLTEXT INDEX `wp_posts_fulltext_title` (`post_title` ASC);" );
$wpdb->query( "ALTER TABLE $wpdb->posts ADD FULLTEXT INDEX `wp_posts_fulltext_excerpt` (`post_excerpt` ASC);" );
$wpdb->query( "ALTER TABLE $wpdb->posts ADD FULLTEXT INDEX `wp_posts_fulltext_content` (`post_content` ASC);" );

Indeed that could easily time out when there are many posts in the database. Still, there are many sites where that would not be a problem.

We brainstormed a solution that would offer users the best sensible defaults:

  • WP upgrade flow would attempt to create these indexes one time only. Perhaps we could reduce the number of people experiencing these timeouts by short-circuiting if SELECT COUNT(*) from wp_posts is a large number (how large?). Perhaps also short-circuit on multisites. Maybe we could also rely on some filters or otherwise provide a way out for those sites that really don't want to be upgraded.
  • If the indexes are successfully created, that's great and full-text search becomes available on the site. It could be enabled/disabled via site settings.
  • If the operation times out or otherwise fails, it will not be attempted again. Refreshed upgrade screen would include a short note about how Full-text search is available in WP 5.7 but could not be automatically enabled, and then link to a documentation page with further details. The same note would be displayed instead of the related site setting. There could also be a wp-cli command to attempt the upgrade.

In this scenario, the upgrade would be automatic and the failure would handled gracefully.

Last edited 4 years ago by zieladam (previous) (diff)

#10 @peterwilsoncc
4 years ago

Should this go ahead, what would the effect be for someone moving hosts and doing an SQL export from a version that supports full text indexes and attempting to import in to a version that does not?

I can't find a reference but I have seen such a scenario considered in the past however I can't remember the outcome of the discussion.

I'm unsure about the 'large site' check, the logic is sound but in my experience such sites usually use the cli (wp core update-db) for DB upgrades rather than doing them in the admin.

#11 @zieladam
3 years ago

hat would the effect be for someone moving hosts and doing an SQL export from a version that supports full text indexes and attempting to import in to a version that does not?

@peterwilsoncc My guess is that the import will fail as FULLTEXT KEY will be part of CREATE TABLE and not a separate ALTER TABLE statement. One thing we could do is provide good documentation on how to handle this case (e.g. drop the index before exporting).

It is also worth noting that it the last major MySQL version that wouldn't support it, 5.5, reached end of life in December 2018. Even MySQL 5.6 is not maintained anymore as of February 2021. The official WP.org requirements page says that WordPress requires MySQL 5.6 or newer: https://wordpress.org/about/requirements/

So maybe this is not an issue at all?

I'm unsure about the 'large site' check, the logic is sound but in my experience such sites usually use the cli (wp core update-db) for DB upgrades rather than doing them in the admin.

We could probably follow the same logic in CLI and render a note or prompt the user to answer a yes / no question.

#12 @azaozz
3 years ago

  • Keywords 2nd-opinion added; has-patch removed

Looking here again, there may be couple more considerations:

  1. Speed consideration for when saving to the DB. If there's a fulltext index on post_content, it will have to be updated every time a new post is saved or a post is updated. Normally that should be quite fast, but might become problematic for very big sites with millions of rows in wp_posts where the index will be several MB (saving a post is quite slow afaik). Perhaps worth a look/test.
  2. Disk space: how much bigger the DB will be with a fulltext index on post_content?

If the above two are of no concern, perhaps the fulltext indexes can be added only on new installs (when supported)? Then the search queries can use fulltext if it exists, something like: SHOW INDEX FROM my_table WHERE Key_name = 'index_to_check'.

the import will fail as FULLTEXT KEY will be part of CREATE TABLE and not a separate ALTER TABLE statement

Yeah, thinking good docs would be sufficient for this case. Something like "When importing to older versions of MySQL, and the import fails with ...... error message, drop the fulltext indexes before exporting and try again".

#13 @zieladam
2 years ago

@barry once mentioned that using the FULLTEXT feature can make importing data 100x slower and that sites that enable it via plugins tend to experience problems. I am hazy on the specific kinds of problems, but the 100x slowdown of writes sounds pretty bad.

#14 follow-up: @zieladam
2 years ago

Good notes @azaozz! Here's some more thoughts:

  1. Speed consideration for when saving to the DB. If there's a fulltext index on post_content, it will have to be updated every time a new post is saved or a post is updated. Normally that should be quite fast, but might become problematic for very big sites with millions of rows in wp_posts where the index will be several MB (saving a post is quite slow afaik). Perhaps worth a look/test.

The trade-off here is between the speed of writes and reads.

MATCH is much faster than LIKE so the searching would be faster.

I haven't found any INSERT benchmarks for the fulltext feature though, I found a few complaints about slow writes when using fulltext index and some discussion how it's not suitable for larger datasets because of slow INSERTs. This aligns with the concerns expressed by @barry. Unfortunately, there doesn't seem to be many charts and tables out there.

  1. Disk space: how much bigger the DB will be with a fulltext index on post_content?

As for the disk space: I did a few quick searches and found that the index could take somewhere between 10% and 80% of the size of the text. That's all anecdotal as I haven't done any measurements myself. While that's a pretty large range, it already tells us that the size of the index is an important factor here.

Last edited 2 years ago by zieladam (previous) (diff)

#15 in reply to: ↑ 14 @azaozz
2 years ago

Replying to zieladam:

I haven't found any INSERT benchmarks for the fulltext feature though...
This aligns with the concerns expressed by @barry. Unfortunately, there doesn't seem to be many charts and tables out there.

Yea, heard about Barry's concerns too. Another concern may be when adding the index. Normally updates to the DB happen after installing a new WordPress version (the familiar "Database update required" screen), however creating that FULLTEXT index may need more time/different approach for sides with a lot of posts.

Perhaps another good test idea would be to see if a "normal" request would be sufficient to add the index for a very large table. Then come up with a solution if this causes PHP to time out of MySQL runs out of resources, or any other possible problems.

As for the disk space: I did a few quick searches and found that the index could take somewhere between 10% and 80% of the size of the text.

Considering that wp_posts is the largest table, and that most of that size is for post_content, such size increase would be significant. Still, not sure if that's a big drawback considering the overall performance improvements.

Last edited 2 years ago by azaozz (previous) (diff)

adamziel commented on PR #770:


2 years ago
#16

I'm not pursuing this work anymore so I'm closing this PR. Feel free to take over!

Note: See TracTickets for help on using tickets.