WordPress.org

Make WordPress Core

Opened 3 years ago

Last modified 3 years ago

#37956 reopened defect (bug)

DB_COLLATE doesn't override $collate when defining utf8mb4_unicode_ci in wp-config

Reported by: MikeGillihan Owned by:
Milestone: Awaiting Review Priority: normal
Severity: normal Version: 4.6
Component: Database Keywords:
Focuses: Cc:
PR Number:

Description

As of 4.6 when I spin up a site locally it's created using utf8mb4_unicode_520_ci as the dafault collation. However, this is causing issues whenever I push to a live server that does not yet support the newer version.

If a constant is specifically defined in wp-config.php, should it override the default behavior? Currently, if utf8mb4_unicode_ci is defined it is still upgraded to utf8mb4_unicode_520_ci regardless of the constant.

Attachments (1)

37956.patch (674 bytes) - added by MikeGillihan 3 years ago.

Download all attachments as: .zip

Change History (6)

@MikeGillihan
3 years ago

#1 follow-ups: @pento
3 years ago

  • Milestone Awaiting Review deleted
  • Resolution set to wontfix
  • Status changed from new to closed

This behaviour is on purpose - if DB_COLLATE is defined as utf8mb4_unicode_ci, but the server supports utf8mb4_unicode_520_ci, it's better to use the latter, in much the same way as setting DB_CHARSET to utf8 will be automatically upgraded to utf8mb4 when possible.

The workaround for this is to either have your development and production environments match, or to include a step in your data migration process to change the collation.

#2 in reply to: ↑ 1 @discern
3 years ago

Replying to pento:

The workaround for this is to either have your development and production environments match, or to include a step in your data migration process to change the collation.

This is frustrating. May I suggest a constant that would override the override? Something like:

wp-config.php

define('DB_COLLATE_OVERRIDE', [true|false]); // default true

#3 in reply to: ↑ 1 @MikeGillihan
3 years ago

Replying to pento:

This behaviour is on purpose - if DB_COLLATE is defined as utf8mb4_unicode_ci, but the server supports utf8mb4_unicode_520_ci, it's better to use the latter, in much the same way as setting DB_CHARSET to utf8 will be automatically upgraded to utf8mb4 when possible.

Sorry, I missed your response @pento. Thanks for being so speedy!

I understand the behavior is intended and I agree that 520 is better. My point was more about the fact that the core file overrides the global constant defined in wp-config.php.

It's a bit abstract, but if I incorrectly define DB_NAME, the constant is respected and it breaks the install. Why then, do we not respect the DB_COLLATE constant?

@discern While it could achieve the desired result, it feels a bit heavy. The patch I provided just adds a conditional wrapper that fully respects the constant.

#4 @cjke7777
3 years ago

  • Resolution wontfix deleted
  • Status changed from closed to reopened

I want to echo the sentiment here, that WP should be respecting the DB_COLLATE/DB_CHARSET if explicitly set.

There is an open bug in mysql where utf8mb4_unicode_520_ci doesn't correctly distinguish certain Japanese characters (https://bugs.mysql.com/bug.php?id=79977).

So when I use get_page_by_title in Wordpress with a dakuten, it will incorrectly return a result if they share a base (searching for ぺ will return a post with へ). Simple solution is to not use the 520 collation.

It would be ok to say it's a bug in mysql (which is true), but I should be able to select the best charset/collate for my particular use case.

#5 @SergeyBiryukov
3 years ago

  • Milestone set to Awaiting Review
Note: See TracTickets for help on using tickets.