Make WordPress Core

Opened 3 years ago

Last modified 4 months ago

#54836 new defect (bug)

Huge error logs filled with "WordPress database error Illegal mix of collations" errors caused by spammers

Reported by: jh20001's profile jh20001 Owned by:
Milestone: Awaiting Review Priority: normal
Severity: critical Version:
Component: Database Keywords:
Focuses: Cc:

Description

This has been going on for quite a long time. By that, it has now been years. Every few weeks to a month we have to delete our error log as it builds to become an incredibly huge file on the server. So far, there has been no resolution to this anywhere. We have tried the forums and so many other places which have recommended starting a ticket as clearly, we aren't alone in experiencing this. There has to be a way of preventing this.

This causes huge stability issues and consumes a LOT of space on the host. Therefore, I feel this has earned the privilege of being considered "critical". I filed under "database" although this may be better under something else as it seems to be some kind of injection attack using search engines and fake links that try to load content into the blog via error or search results, etc.

Here is (believe it or not) a very small sample of what the error logs hold every time...

(see attachment, as this would be far too much to paste into the post)

Attachments (1)

error_log (126.6 KB) - added by jh20001 3 years ago.
This is a tiny (micro) sample of what ends up in the error log. File generally reaches 100MB within a few weeks and continues to grow each time causing issues.

Download all attachments as: .zip

Change History (9)

@jh20001
3 years ago

This is a tiny (micro) sample of what ends up in the error log. File generally reaches 100MB within a few weeks and continues to grow each time causing issues.

#1 follow-ups: @audrasjb
3 years ago

  • Version 5.8.3 deleted

Hello @jh20001 welcome to WordPress Core Trac and thank you for opening this ticket,

Based on the error logs you shared, it looks like you have different types of collations in your database. I think it is not something you should have on a default and recent WordPress install.

However, I’ll let WP DB component maintainers give a detailed answer on the issue and the related fix, and for now I'll just remove the 5.8.3 version as this wasn't introduced in the last version of WordPress.

#2 in reply to: ↑ 1 @jh20001
3 years ago

Replying to audrasjb:

Hello @jh20001 welcome to WordPress Core Trac and thank you for opening this ticket,

Based on the error logs you shared, it looks like you have different types of collations in your database. I think it is not something you should have on default and recent WordPress install.

However, I’ll let WP DB component maintainers give a detailed answer on the issue and the related fix, and for now I'll just remove the 5.8.3 version as this wasn't introduced in the last version of WordPress.

These errors shouldn't even be displayed as none of that content exists on the site. It's all spammed content coming from somewhere external displayed in order to gain fake search results with their spam on other people's domain. Notice how it is all medicine, adult, and other forms of random spam being trying to trick the site into displaying via error screens or the search system.

#3 @varinupadhyay
20 months ago

#58133 was marked as a duplicate.

This ticket was mentioned in Slack in #core by presskopp. View the logs.


20 months ago

#5 in reply to: ↑ 1 @pbiron
20 months ago

Replyinhttps://core.trac.wordpress.org/ticket/54836?replyto=1#g to audrasjb:

Based on the error logs you shared, it looks like you have different types of collations in your database. I think it is not something you should have on a default and recent WordPress install.

To me it the messages in the log don't suggest a mix of collations in the DB, but rather a different in the collation in the DB and the encoding of the text in the query.

I think a way to test that hypothesis is to use PHP's mb_convert_encoding() to "force" the encoding of a simple WP_Query string to use a different collation than the DB does. I suspect the error message in that case will be the same as what's in the sample log file.

If that is the case, I don't know that core can/should do anything about this.

#6 @craigfrancis
19 months ago

The error_log from @jh20001 shows the database tables/fields using latin1_general_ci, and trying to compare against a value it believes is utf8mb4_unicode_520_ci - this might be from another table, or from the user value (e.g. it guesses this type due to characters such as "📞").

Likewise, the error_log from @varinupadhyay (ref #58133) uses utf8_general_ci (aka utf8mb3_general_ci), which does not support all unicode characters (maximum of three bytes per code point), so it has the same problem (in this case, characters like "🥝").


You can test this with:

CREATE TABLE `test` (
  `value` TINYTEXT NOT NULL
) CHARACTER SET utf8 COLLATE utf8_general_ci;

INSERT INTO test VALUES ("a"), ("ü"), ("楼");
INSERT INTO test VALUES ("🥝");

SELECT * FROM test WHERE value LIKE "%🥝%";

WordPress tries to setup new tables/fields with utf8mb4_unicode_520_ci or utf8mb4_unicode_ci, which supports characters like 📞 and 🥝, but older databases would not have used them.

You can test your servers support by going to "WP-Admin > Tools > Site Health", and checking the result for "UTF8MB4 is supported" (which might be under "Passed Tests").

I'd recommend changing the character-set in both of these databases (with the usual notes of making a backup, checking for encoding issues, and being aware that it can take time to alter big tables).


As to solutions...

It would be too difficult/dangerous/slow to check and make changes to every single SELECT/INSERT/UPDATE query - i.e. parsing the SQL, identifying every field and value, checking their character-set, and doing something (e.g. using CONVERT) to make them compatible.

Personally I agree with @pbiron, I don't think core can/should do anything about this... unless someone can come up with a way to get the upgrade process to check/alter every single WP database/table/field to use utf8mb4 without any issues.


As an aside, update.php does have a maybe_convert_table_to_utf8mb4() function, but it only works when all fields are utf8 or utf8mb4, it's only used when upgrading between certain versions of the database, and it can still be risky:

CREATE TABLE `test` (
  `a` TINYTEXT NOT NULL,
  `b` TINYTEXT CHARACTER SET utf8 COLLATE utf8_bin NOT NULL
) CHARACTER SET utf8 COLLATE utf8_general_ci;

ALTER TABLE `test` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

SHOW CREATE TABLE test;

Note how the Bin field becomes Case-Insensitive, and the fields change from TINYTEXT to TEXT.

https://dev.mysql.com/doc/refman/8.0/en/alter-table.html#alter-table-character-set
https://codex.wordpress.org/Converting_Database_Character_Sets
https://make.wordpress.org/core/2015/04/02/the-utf8mb4-upgrade/

#7 @domainsupport
7 months ago

We've just found this ticket when we found similar errors in our logs.

You can replicate the issue by performing a site search for an emoji and text such as "🥳 test" or as a query string /?s=🥳+test when your site's database's wp_posts table are using a utf8mb3_% collation.

This will create the following error ...

[02-May-2024 08:21:37 UTC] WordPress database error Illegal mix of collations (utf8mb3_general_ci,IMPLICIT) and (utf8mb4_unicode_520_ci,COERCIBLE) for operation 'like' for query SELECT SQL_CALC_FOUND_ROWS  wp_posts.ID
					 FROM wp_posts 
					 WHERE 1=1  AND (((wp_posts.post_title LIKE '%🥳%') OR (wp_posts.post_excerpt LIKE '%🥳%') OR (wp_posts.post_content LIKE '%🥳%')) AND ((wp_posts.post_title LIKE '%test%') OR (wp_posts.post_excerpt LIKE '%test%') OR (wp_posts.post_content LIKE '%test%')))  AND ((wp_posts.post_type = 'post' AND (wp_posts.post_status = 'publish' 
OR wp_posts.post_status = 'private')) OR (wp_posts.post_type = 'page' AND (wp_posts.post_status = 'publish' 
OR wp_posts.post_status = 'private')) OR (wp_posts.post_type = 'attachment' AND (wp_posts.post_status = 'publish' 
OR wp_posts.post_status = 'private')) OR (wp_posts.post_type = 'product' AND (wp_posts.post_status = 'publish' 
OR wp_posts.post_status = 'private')))
					 
					 ORDER BY (CASE WHEN wp_posts.post_title LIKE '%🥳 test%' THEN 1 WHEN wp_posts.post_title LIKE '%🥳%' AND wp_posts.post_title LIKE '%test%' THEN 2 WHEN wp_posts.post_title LIKE '%🥳%' OR wp_posts.post_title LIKE '%test%' THEN 3 WHEN wp_posts.post_excerpt LIKE '%🥳 test%' THEN 4 WHEN wp_posts.post_content LIKE '%🥳 test%' THEN 5 ELSE 6 END), wp_posts.post_date DESC
					 LIMIT 0, 5 made by require('wp-blog-header.php'), wp, WP->main, WP->query_posts, WP_Query->query, WP_Query->get_posts

Should emoji's even be allowed / necessary in site searches? We have only ever seen this from bot / spam activity.

At the very least can we add a filter to the site search to strip out emoji's etc from search queries if database tables are using a utf8mb3_% collation?

Has anyone confirmed that having a wp_posts table with utf8mb4_ collation fixes this issue?

Oliver

#8 @drzraf
4 months ago

If it ends up cluttering the logs, then it's a problem and in this case, not a problem a webmaster/sysadmin is expected to resolve on its own since its due to a mismatch between unfiltered query-string input provided as-is to MySQL by WP core.

The least WP could do is to provide a filter with a preg_replace / mb_convert_encoding logic so that characters passed down to $wpdb are compatible with the current MySQL charset/schema.

Note: See TracTickets for help on using tickets.