Make WordPress Core

Opened 2 years ago

Last modified 11 months ago

#54836 new defect (bug)

Huge error logs filled with "WordPress database error Illegal mix of collations" errors caused by spammers

Reported by: jh20001's profile jh20001 Owned by:
Milestone: Awaiting Review Priority: normal
Severity: critical Version:
Component: Database Keywords:
Focuses: Cc:

Description

This has been going on for quite a long time. By that, it has now been years. Every few weeks to a month we have to delete our error log as it builds to become an incredibly huge file on the server. So far, there has been no resolution to this anywhere. We have tried the forums and so many other places which have recommended starting a ticket as clearly, we aren't alone in experiencing this. There has to be a way of preventing this.

This causes huge stability issues and consumes a LOT of space on the host. Therefore, I feel this has earned the privilege of being considered "critical". I filed under "database" although this may be better under something else as it seems to be some kind of injection attack using search engines and fake links that try to load content into the blog via error or search results, etc.

Here is (believe it or not) a very small sample of what the error logs hold every time...

(see attachment, as this would be far too much to paste into the post)

Attachments (1)

error_log (126.6 KB) - added by jh20001 2 years ago.
This is a tiny (micro) sample of what ends up in the error log. File generally reaches 100MB within a few weeks and continues to grow each time causing issues.

Download all attachments as: .zip

Change History (7)

@jh20001
2 years ago

This is a tiny (micro) sample of what ends up in the error log. File generally reaches 100MB within a few weeks and continues to grow each time causing issues.

#1 follow-ups: @audrasjb
2 years ago

  • Version 5.8.3 deleted

Hello @jh20001 welcome to WordPress Core Trac and thank you for opening this ticket,

Based on the error logs you shared, it looks like you have different types of collations in your database. I think it is not something you should have on a default and recent WordPress install.

However, I’ll let WP DB component maintainers give a detailed answer on the issue and the related fix, and for now I'll just remove the 5.8.3 version as this wasn't introduced in the last version of WordPress.

#2 in reply to: ↑ 1 @jh20001
2 years ago

Replying to audrasjb:

Hello @jh20001 welcome to WordPress Core Trac and thank you for opening this ticket,

Based on the error logs you shared, it looks like you have different types of collations in your database. I think it is not something you should have on default and recent WordPress install.

However, I’ll let WP DB component maintainers give a detailed answer on the issue and the related fix, and for now I'll just remove the 5.8.3 version as this wasn't introduced in the last version of WordPress.

These errors shouldn't even be displayed as none of that content exists on the site. It's all spammed content coming from somewhere external displayed in order to gain fake search results with their spam on other people's domain. Notice how it is all medicine, adult, and other forms of random spam being trying to trick the site into displaying via error screens or the search system.

#3 @varinupadhyay
12 months ago

#58133 was marked as a duplicate.

This ticket was mentioned in Slack in #core by presskopp. View the logs.


12 months ago

#5 in reply to: ↑ 1 @pbiron
12 months ago

Replyinhttps://core.trac.wordpress.org/ticket/54836?replyto=1#g to audrasjb:

Based on the error logs you shared, it looks like you have different types of collations in your database. I think it is not something you should have on a default and recent WordPress install.

To me it the messages in the log don't suggest a mix of collations in the DB, but rather a different in the collation in the DB and the encoding of the text in the query.

I think a way to test that hypothesis is to use PHP's mb_convert_encoding() to "force" the encoding of a simple WP_Query string to use a different collation than the DB does. I suspect the error message in that case will be the same as what's in the sample log file.

If that is the case, I don't know that core can/should do anything about this.

#6 @craigfrancis
11 months ago

The error_log from @jh20001 shows the database tables/fields using latin1_general_ci, and trying to compare against a value it believes is utf8mb4_unicode_520_ci - this might be from another table, or from the user value (e.g. it guesses this type due to characters such as "📞").

Likewise, the error_log from @varinupadhyay (ref #58133) uses utf8_general_ci (aka utf8mb3_general_ci), which does not support all unicode characters (maximum of three bytes per code point), so it has the same problem (in this case, characters like "🥝").


You can test this with:

CREATE TABLE `test` (
  `value` TINYTEXT NOT NULL
) CHARACTER SET utf8 COLLATE utf8_general_ci;

INSERT INTO test VALUES ("a"), ("ü"), ("楼");
INSERT INTO test VALUES ("🥝");

SELECT * FROM test WHERE value LIKE "%🥝%";

WordPress tries to setup new tables/fields with utf8mb4_unicode_520_ci or utf8mb4_unicode_ci, which supports characters like 📞 and 🥝, but older databases would not have used them.

You can test your servers support by going to "WP-Admin > Tools > Site Health", and checking the result for "UTF8MB4 is supported" (which might be under "Passed Tests").

I'd recommend changing the character-set in both of these databases (with the usual notes of making a backup, checking for encoding issues, and being aware that it can take time to alter big tables).


As to solutions...

It would be too difficult/dangerous/slow to check and make changes to every single SELECT/INSERT/UPDATE query - i.e. parsing the SQL, identifying every field and value, checking their character-set, and doing something (e.g. using CONVERT) to make them compatible.

Personally I agree with @pbiron, I don't think core can/should do anything about this... unless someone can come up with a way to get the upgrade process to check/alter every single WP database/table/field to use utf8mb4 without any issues.


As an aside, update.php does have a maybe_convert_table_to_utf8mb4() function, but it only works when all fields are utf8 or utf8mb4, it's only used when upgrading between certain versions of the database, and it can still be risky:

CREATE TABLE `test` (
  `a` TINYTEXT NOT NULL,
  `b` TINYTEXT CHARACTER SET utf8 COLLATE utf8_bin NOT NULL
) CHARACTER SET utf8 COLLATE utf8_general_ci;

ALTER TABLE `test` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

SHOW CREATE TABLE test;

Note how the Bin field becomes Case-Insensitive, and the fields change from TINYTEXT to TEXT.

https://dev.mysql.com/doc/refman/8.0/en/alter-table.html#alter-table-character-set
https://codex.wordpress.org/Converting_Database_Character_Sets
https://make.wordpress.org/core/2015/04/02/the-utf8mb4-upgrade/

Note: See TracTickets for help on using tickets.