Make WordPress Core

Opened 2 years ago

Last modified 4 months ago

#55603 accepted task (blessed)

PHP 8.2: address deprecation of the utf8_encode() and utf8_decode() functions

Reported by: jrf's profile jrf Owned by: sergeybiryukov's profile SergeyBiryukov
Milestone: 6.7 Priority: normal
Severity: normal Version: 6.0
Component: General Keywords: 2nd-opinion php82 dev-feedback has-patch has-unit-tests
Focuses: coding-standards Cc:

Description (last modified by afercia)

Context

The PHP RFC to remove the `utf8_encode()` and `utf8_decode()` functions from PHP in PHP 9.0 has recently been accepted.

This means in effect that as of PHP 8.2, those functions will be deprecated and a deprecation notice will be thrown whenever they are called.

The reasoning behind the deprecation and removal is that these functions are confusing and rarely used correctly.

See the Usage section of the RFC for an analysis of the various (mostly incorrect) uses of the functions.

The Problem

The typical replacements for these functions are using the MBString extension and/or the Iconv extension.

As these extensions are both optional extensions in PHP, they cannot be relied on to be available in an open source context.

WordPress uses the utf8_encode() function a few times in the codebase:

  • 1 x utf8_encode() in src/wp-admin/includes/export.php
  • 2 x utf8_encode() in src/wp-admin/includes/image.php
  • 1 x utf8_encode() in tests/phpunit/tests/kses/php

Aside from that the external dependency GetID3 also uses both these functions a number of times.

A search of the plugin and theme directory shows more worrying results with a plenitude of matches:

Options

So, what are the options we have ?

In my opinion, especially seeing how these functions are used so often in plugins, there are only two realistic options:

1. We could polyfill these functions.

While some functions which may not be available are polyfilled by WP, this is generally only done to have access to new PHP functionality or to allow for using functions which require certain optional extensions to be enabled.

As far as I know, no PHP native function has ever been polyfilled due to it being removed from PHP.

Pro:
Relatively simple solution and everything keeps working (deprecation notices will still show when running on PHP 8.x, though these could silenced).

Con:
As most uses of these functions are likely to be incorrect usage (especially in plugins), these "bugs" will remain and not be reviewed or addressed, undercutting the improvement PHP is trying to make.

2. We could make the MbString (or the Iconv) extension a requirement

At this moment, both the MbString as well as the Iconv extension are recommended, but not required by WP.

A couple of MbString functions are also polyfilled in WP, so now might be a good time to make the MbString extension a requirement for WP.

Pro:
MbString being available will allow for fixing the deprecations in a forward-compatible manner. It will also allow for other code improvements to be made to improve WPs support for languages using non-latin based scripts.

Con:
A new requirement would be added to WP which should not be taken lightly. At the same time, it should be noted that MbString is generally enabled already anyway, so this will impact only a small percentage of users.

Why MbString instead of Iconv ?

While both are included (though not enabled) by default with PHP, Iconv requires the `libiconv` library, which may not be available, while MbString has no external dependencies.

MbString is not enabled by default in PHP, but generally is enabled in practice.
Iconv is enabled by default in PHP, but can be disabled.

Having said that, MbString offers much more functionality than the limited functionality offered by Iconv and - as indicated by a number of functions being polyfilled - is already in use in WP.

Still, it would be helpful if someone with access to the underlying statistics data collected by WP could add figures to this issue showing how often either extension is enabled on systems running WP.

Recommendation

I'd strongly recommend option 2, but would like to hear the opinions of additional Core devs.

Action lists

General

  • [x] Report the issue to GetID3

Action list for option 1

  • [ ] Polyfill the functions.
  • [ ] Review the uses of the functions in WP Core anyhow to see if those could/should be removed/the code using them should be refactored.
  • [ ] Add a note about the polyfills in a dev-note with a recommendation for plugin/theme authors to review their use of these functions anyhow.

Action list for option 2

  • [ ] Make the MbString a requirement for installing WP/in the WP bootstrapping.
  • [ ] Change the MbString extension from optional to required in the Site Health component.
  • [ ] Remove the current MbString related polyfills from the compat.php file.
  • [ ] Review the uses of the functions in WP Core and replace with more appropriate alternatives.
  • [ ] Add a note about the deprecation in the PHP 8.2 dev-note with a recommendation for plugin/theme authors to review their use of these functions and noting that the MbString extension can be relied upon to be available (as of WP 6.1).

Change History (86)

#1 follow-up: @SergeyBiryukov
2 years ago

Thanks for the ticket!

For option 2, I think we'd need to also check the required PHP modules during installation, to avoid fatal errors. As far as I can tell, marking a module as required in Site Health displays a warning on existing sites if the module is unavailable, but does not block the installation for new sites.

#2 in reply to: ↑ 1 @jrf
2 years ago

  • Description modified (diff)

Replying to SergeyBiryukov:

For option 2, I think we'd need to also check the required PHP modules during installation, to avoid fatal errors. As far as I can tell, marking a module as required in Site Health displays a warning on existing sites if the module is unavailable, but does not block the installation for new sites.

Very good point! I always forget that those are two different lists of checks.
I've added it to the action list.

Might also be a good time to improve the code re-usability in that regards (have one master list of requirements and recommendations, which both the requirements checker on installation/load + the Site Health component draw from).

#3 @ayeshrajans
2 years ago

Not a core developer, but I'd strongly prefer the option 2, too.

I apologize if I missed any part of the conversation, but I was wondering if we considered using symfony/symfony/polyfill-mbstring (contains 36 mb_ functions) to replace our two functions in compat.php. It would work as a stop-gap, and we can add a note if we are polyfill-ing mbstring asking the user to enable mbstring extension.

In case polyfilled, symfony/symfony/polyfill-mbstring adds about 1,000 LOCs to the code base, plus some unicode data to the WordPress code base, but it's a maintained library and we might eventually use it as a composer dependency rather than manually bundling it.

#4 @jrf
2 years ago

Hi @ayeshrajans Thanks for your input.

I did consider the Symfony polyfills, but as the Symfony polyfills have a minimum requirement of PHP 7.1 and the MbString extensions has been a recommended extension via WP Site Health for quite a while, I think requiring the extension should be the preferred option.

To me, the PHP 7.1 requirement is a blocker as that would mean that WP would have to use an old version of the polyfills which will not get any updates, which is asking for trouble.
As for raising the minimum PHP version of WP, that's a whole other discussion and I have little hope of that happening in time for the release of PHP 8.2.

Other than that: Symfony polyfills is licensed under the MIT license and I'm not sure how well that plays together with the WordPress GPL license. That would need some advise from legal.

This ticket was mentioned in Slack in #core by costdev. View the logs.


2 years ago

This ticket was mentioned in Slack in #core by audrasjb. View the logs.


2 years ago

#7 @Chouby
2 years ago

@jrf In the eventuality that you would like to use the Symfony polyfills, the license should not be an obstacle. The license is listed as compatible with the GPL by GNU itself. See https://www.gnu.org/licenses/license-list.en.html#Expat.
As there is an ambiguity for the name "MIT license", you can check that the Symfony license (https://symfony.com/doc/current/contributing/code/license.html) uses exactly the same terms as the Expact (or MIT) license: https://directory.fsf.org/wiki/License:Expat.

#8 follow-ups: @TimothyBlynJacobs
2 years ago

What does making the extension require look like for servers that don't currently have mbstring installed? Do we have the technology to ensure they stay on 6.0.x? Is it that they'll see the 6.1 update as available, but if they try to update to it, they'll get an error message?

#9 in reply to: ↑ 8 @SergeyBiryukov
2 years ago

Replying to TimothyBlynJacobs:

What does making the extension require look like for servers that don't currently have mbstring installed? Do we have the technology to ensure they stay on 6.0.x? Is it that they'll see the 6.1 update as available, but if they try to update to it, they'll get an error message?

Not at this time, see comment:1. We should check for the required PHP extensions both during installation and updates, similar to how we check for the required PHP and MySQL versions. I have created #56017 for that.

#10 @ayeshrajans
2 years ago

Thanks a lot for the reply @jrf, you are absolutely right about the minimum version requirement in symfony/polyfill, that really deal breaker.

One more imperfect approach I could think of is adding WP's own equivalents (say, iso8859_1_to_utf8) based on symfony/polyfill-php72 (which provides user-land polyfills) to utf8_encode and utf8_decode if we cannot afford to require ext-mbstring yet. But this again comes with the MIT license legality, and is inherently a bad idea.

#11 @ayeshrajans
2 years ago

As for the two utf8_encode calls in wp_read_image_metadata and wxr_cdata functions, they are wrapped with !seems_utf8() clauses, which make believe core's utf8_encode calls are wrong too because there is no guarantee the source strings are always ISO-8859-1.

This ticket was mentioned in Slack in #hosting-community by javier. View the logs.


2 years ago

This ticket was mentioned in Slack in #hosting-community by jadonn. View the logs.


2 years ago

This ticket was mentioned in Slack in #core-php by jrf. View the logs.


2 years ago

This ticket was mentioned in Slack in #core by audrasjb. View the logs.


2 years ago

This ticket was mentioned in Slack in #hosting-community by javier. View the logs.


2 years ago

This ticket was mentioned in Slack in #hosting-community by chaion07. View the logs.


2 years ago

This ticket was mentioned in Slack in #hosting-community by jadonn. View the logs.


2 years ago

This ticket was mentioned in Slack in #hosting-community by javier. View the logs.


2 years ago

This ticket was mentioned in Slack in #hosting-community by amykamala. View the logs.


2 years ago

This ticket was mentioned in Slack in #core by chaion07. View the logs.


2 years ago

This ticket was mentioned in Slack in #core by audrasjb. View the logs.


2 years ago

#23 @audrasjb
2 years ago

Hello there, as per today's bug scrub, just wanted to mention that we'll give this ticket a few days before moving in to Future Release, as it is marked as early and as we're just one month away from 6.1 beta 1 :)

#24 @jrf
2 years ago

@audrasjb Thanks for the heads-up. The problem is not a patch, the problem is that we need more people speaking up to support adding MbString (or iconv or Intl) as a required extension.

I personally would find it pretty problematic if this gets moved to a future release, as it will make it significantly harder for plugins/themes to get ready for PHP 8.2.

This ticket was mentioned in Slack in #core by chaion07. View the logs.


2 years ago

This ticket was mentioned in Slack in #core by audrasjb. View the logs.


2 years ago

#27 @audrasjb
2 years ago

  • Keywords early removed

As per today's bugscrub, let's remove the early keyword and raise this ticket in the next devchat.

This ticket was mentioned in Slack in #core by abhanonstopnews. View the logs.


2 years ago

This ticket was mentioned in Slack in #core by hellofromtonya. View the logs.


2 years ago

This ticket was mentioned in Slack in #hosting-community by costdev. View the logs.


2 years ago

This ticket was mentioned in Slack in #core by chaion07. View the logs.


2 years ago

#32 @chaion07
2 years ago

  • Keywords dev-feedback added

Thanks @jrf for reporting this. We reviewed the ticket during a recent bug-scrub session. Based on the feedback received we are making the following changes:

  1. Adding the dev-feedback for additional attention
  2. We also recommend this ticket to be discussed during devchat if possible.

Cheers!

Props to @mukesh27 & @robinwpdeveloper

#33 @costdev
2 years ago

@jrf @SergeyBiryukov is #56017 a blocker for Option 2, or a nice-to-have enhancement?

This ticket was mentioned in Slack in #core by abhanonstopnews. View the logs.


2 years ago

This ticket was mentioned in Slack in #core by jeffpaul. View the logs.


2 years ago

#36 in reply to: ↑ 8 ; follow-up: @desrosj
2 years ago

Replying to TimothyBlynJacobs:

What does making the extension require look like for servers that don't currently have mbstring installed? Do we have the technology to ensure they stay on 6.0.x? Is it that they'll see the 6.1 update as available, but if they try to update to it, they'll get an error message?

I wanted to get clarification on the answer to this question.

There is precedent for blocking a site from upgrading when a PHP extension is missing. In WordPress 5.2, the native JSON extension was marked as required (see [46455]). When an upgrade was attempted and a site did not have this extension loaded, an error was shown and the update was cancelled.

I know it's not ideal to add a bunch of on-off extension checks in the updater, but these changes are rare and they could be consolidated into a better solution through #56017 once solved.

I'm not yet advocating for or against this being in 6.1, but I want to make sure I'm not missing something when to comes to Core having the ability to "block" an update if a site lacks a certain PHP extension.

One other thing to note is that an error code specific to the scenario where the JSON extension missing is returned, which allows the number of sites to being blocked from an update to be tracked in .org's mission control to gauge whether this deprecation ends up being problematic or not (see [46560]).

#37 @audrasjb
2 years ago

  • Milestone changed from 6.1 to 6.2

With WP 6.1 RC 1 scheduled today (Oct 11, 2022), there is not much time left to address this ticket. Let's move it to the next milestone.

This ticket was mentioned in Slack in #hosting-community by jrf. View the logs.


2 years ago

This ticket was mentioned in Slack in #core by mukeshpanchal27. View the logs.


20 months ago

#40 @hellofromTonya
20 months ago

  • Milestone changed from 6.2 to 6.3
  • Owner set to hellofromTonya
  • Status changed from new to assigned

No additional work or discussion has happened in the 6.2 cycle. With Beta 4 next week and then RC1 the week after, moving this ticket to 6.3. As I'm planning to focus efforts on PHP 8+ compatibility during this next cycle, self-assigning to help shepherd it forward.

This ticket was mentioned in Slack in #core by chaion07. View the logs.


16 months ago

#42 @costdev
16 months ago

This ticket was reviewed during the recent bug scrub. @hellofromTonya Has there been any further work on this during the cycle that hasn't been added to the ticket yet, or will there be?

Props to @chaion07 for discussion.

This ticket was mentioned in Slack in #core by mukeshpanchal27. View the logs.


15 months ago

#44 @chaion07
15 months ago

  • Milestone changed from 6.3 to 6.4

We reviewed this ticket once again during a recent bug-scrub session and based on the feedback we are updating the milestone to 6.4. Cheers!

Props to @mukesh27 & @hareesh-pillai

#45 @hellofromTonya
12 months ago

  • Milestone changed from 6.4 to 6.5

6.4 RC1 is next week. I don't think consensus and resolution can happen in the week remaining in the cycle. Moving it to the next cycle. If however, it can be resolved in time for 6.4, please move it back into the milestone.

#46 @jrf
12 months ago

The problem with this ticket is not getting a patch ready or knowing where the issues are. The problems with this ticket are indecision/lack of empowerment to take decisions which affect whether users can upgrade (adding an extension requirement).

This ticket will just keep getting punted until a decision is taken and validated by the powers that be.

This ticket was mentioned in Slack in #core by jrf. View the logs.


11 months ago

This ticket was mentioned in Slack in #core by rajinsharwar. View the logs.


8 months ago

#49 @rajinsharwar
8 months ago

  • Milestone changed from 6.5 to Future Release

Punting this ticket again for Future Release, until a decision has been made for the workaround. Hopefully, we can get some opinions on this soon.

#50 in reply to: ↑ 36 ; follow-up: @hellofromTonya
8 months ago

I too like and agree with Option 2, requiring the Mbstring extension. IMO it's a necessary and more clean approach to resolve this incompatibility.

As desrosj noted, there is precedence and artwork for blocking installation and upgrade when an extension is missing:

There is precedent for blocking a site from upgrading when a PHP extension is missing. In WordPress 5.2, the native JSON extension was marked as required (see [46455]). When an upgrade was attempted and a site did not have this extension loaded, an error was shown and the update was cancelled.

and

One other thing to note is that an error code specific to the scenario where the JSON extension missing is returned, which allows the number of sites to being blocked from an update to be tracked in .org's mission control to gauge whether this deprecation ends up being problematic or not (see [46560]).

IMO the decision to require the extension and to move forward with implementing it (per the previous changesets approach) should not be dependent upon or blocked #56017. Why? #56017 is an enhancement for handling all required extensions, whereas this ticket is focused specifically on one requirement to unblock a specific incompatibility Core has with PHP 8.2.

I also wonder: How many hosts will be impacted? How in the past has this data been collected?

Well, #47699 to require JSON extension showed a way forward when VaultPress helped to collect the information:

The good folks at VaultPress have run some checks across the sites they monitor, and have found approximately 0.01% of PHP 5.6+ sites don't have the JSON extension available.

I wonder:

Are they available to collect it again?

What other avenues are available to collect the data?

#51 @hellofromTonya
8 months ago

  • Milestone changed from Future Release to 6.5

As this is a blessed task, I'm moving it back into 6.5 cycle to give it visibility. Specifically, I'm thinking of the following actions that can be taken to move it forward:

  1. Make a decision on option 1 or 2.

IMO there's enough support for Option 2. I'm not seeing any objections to it. Thus, it seems option 2 can be the approach.

Any objections?

  1. Set a plan for rolling this out.

Is data needed to validate that a small percentage of hosts will be impacted?

Could a phases approach be taken such as:

  • In 6.5 -> change Mbstring extension from "recommended" to "required" in the docs and hosting information.
  • In 6.6 -> add the code checks and errors, using both [46455] and [46560] as a guide.

#52 @jrf
8 months ago

@hellofromTonya Sounds like a plan to me. I imagine a Make post be needed to announce the new requirement ? Doing that now~ish might help to see if there are valid objections before making the code changes in 6.6.

#53 in reply to: ↑ 50 @ironprogrammer
8 months ago

Replying to hellofromTonya:

Well, #47699 to require JSON extension showed a way forward when VaultPress helped to collect the information:

The good folks at VaultPress have run some checks across the sites they monitor, and have found approximately 0.01% of PHP 5.6+ sites don't have the JSON extension available.

I wonder:

Are they available to collect it again?

Unfortunately, after reaching out to the Jetpack VaultPress Backup (formerly VaultPress) team, the plugin has changed such that they are no longer able to provide this information.

One other thing to note is that an error code specific to the scenario where the JSON extension missing is returned, which allows the number of sites to being blocked from an update to be tracked in .org's mission control to gauge whether this deprecation ends up being problematic or not (see [46560]).

Would it make sense to include an mbstring-specific error code like that introduced with [46560]?

This ticket was mentioned in Slack in #core by sabernhardt. View the logs.


7 months ago

#55 follow-up: @swissspidy
7 months ago

Don‘t we collect that data via the updates API? cc @dd32

#56 follow-up: @dmsnell
7 months ago

Coming in late to this thread, but I will note that in this particular case a polyfill would be trivial.

Given that we know many invocations of these functions are wrong, a polyfill would let us prioritize the efforts to move away from them above efforts to start requiring a new extension that otherwise isn't required.

And it's purely because utf8_encode() and utf8_decode() are so trivial and small and deceptive in their names that I'm sharing this opinion. It would be a different matter if we had to support large databases of character conversion or if these methods were generally useful, but the polyfill amounts mostly to a check if a code point is within a small range and the function is deceitful in its name: it sounds like something one should use when usually it's not.

Polyfilling thus leaves the status quo, which seems fair because the code calling these methods can and maybe should be addressed in their own time and with their own priority, instead of artificially as a result of support in the language being pulled out.

#57 in reply to: ↑ 56 @hellofromTonya
7 months ago

Replying to dmsnell:

Polyfilling thus leaves the status quo, which seems fair because the code calling these methods can and maybe should be addressed in their own time and with their own priority, instead of artificially as a result of support in the language being pulled out.

Hey @dmsnell, what you think about the cons @jrf listed in the ticket's description for the option to polyfill:

As far as I know, no PHP native function has ever been polyfilled due to it being removed from PHP.

and

As most uses of these functions are likely to be incorrect usage (especially in plugins), these "bugs" will remain and not be reviewed or addressed, undercutting the improvement PHP is trying to make.

#58 @dmsnell
7 months ago

@hellofromTonya my apologies for not clarifying in my first comment.

As far as I know, no PHP native function has ever been polyfilled due to it being removed from PHP.

I'm not following how this is a con rather than a simple observation. What is the con?

As most uses of these functions are likely to be incorrect usage (especially in plugins), these "bugs" will remain and not be reviewed or addressed, undercutting the improvement PHP is trying to make.

To me this is about priority, and what I meant in my earlier comment was that if we attempt to start requiring the internationalization extensions now that seems like a fairly large project in its own right and will potentially leave all of these existing cases broken anyway, because now while we technically don't support their use on platforms without the extensions, those existing sites will simply break.

The goals of the PHP community are great: there's a never-ending supply of ways we could improve PHP by removing misleading and/or buggy functions that have been around for decades. Does that translate into a demand on WordPress to update existing code right at the moment PHP decides to?

So the only "con" I interpreted was that this juncture, which forces us to do something, does not also force us to clean up existing code that wasn't on our radar to address. If we agree that all of this existing code needs refactoring, then we can take this moment to do that, though we'll still be racing against an external clock. Polyfilling doesn't make it any harder to find these instances, because they only require searching for \butf8_(?:en|de)code\(.

In other words, I don't see this con to carry much weight because it leaves things as they are without making them worse and it leaves control in our hands for how to prioritize our time. "Fixing" the existing code is inevitably going to introduce new bugs and raise questions we didn't realize we have to answer, and if we want to do that, we will need to form a plan to do so, and the only difference between polyfilling and requiring extensions is the artificial deadline and a large community thrust to add new requirements.

#59 follow-up: @swissspidy
7 months ago

I'm not following how this is a con rather than a simple observation. What is the con?

The con is that the functions are not actually removed yet in PHP, they're just deprecated. We can't polyfill functions that still exist. Many WP site will be on PHP 8.2 and below for many years to come, continuing to experience these deprecation warnings. We need to address this now, not in X years when we raise the minimum PHP version to one where the functions were actually removed.

---

Speaking of polyfilling though, have we considered using https://packagist.org/packages/symfony/polyfill-mbstring instead of making the mbstring extension mandatory? I successfully use it in some of my plugins because mbstring is not so widely adopted as one would hope.

#60 follow-up: @dmsnell
7 months ago

We can't polyfill functions that still exist

Why not @swissspidy? since it's part of PHP, am I missing some reason we can't check early on for its presence and conditionally include? I would expect this to never fail, since it's either there in PHP at bootup or it's never going to be otherwise dynamically added.

<?php
if ( function_exists( 'utf8_encode' ) ) {
        require_once __DIR__ . '/wp-includes/utf8_encode_polyfill.php';
}

or something like that?

#61 in reply to: ↑ 59 ; follow-up: @ironprogrammer
7 months ago

Replying to swissspidy:

...have we considered using https://packagist.org/packages/symfony/polyfill-mbstring instead of making the mbstring extension mandatory?

This was suggested early on (comment:3), but even considering the relatively recent change of minimum PHP from 5.6 to 7.0, as noted in comment:4, the package requires PHP 7.1+.

#62 in reply to: ↑ 55 @dd32
7 months ago

Replying to swissspidy:

Don‘t we collect that data via the updates API? cc dd32

We do, as of WordPress 6.1. I know for a fact that mbstring was less common with PHP5.x based on historical php extension investigations.

The current numbers for mbstring are 99.44% and 99.70% for iconv. An intersection of 99.98% has both.

Keep in mind that the site counts for a 0.5% is not insignificant.

If adding a polyfill is minimal (for either these utf8 functions, or mbstring), that's the route I'd personally vote for, just as it avoids adding an extra requirement for a non-technical user to understand.

Once PHP 8 or 9 is the WP minimum, the extension usage may have changed significantly, leading to any polyfil being able to be removed; so it wouldn't be adding a forever unused code branch.

#63 @swissspidy
7 months ago

  • Milestone changed from 6.5 to 6.6

#64 in reply to: ↑ 60 @SergeyBiryukov
7 months ago

Replying to dmsnell:

We can't polyfill functions that still exist

Why not @swissspidy? since it's part of PHP, am I missing some reason we can't check early on for its presence and conditionally include? I would expect this to never fail, since it's either there in PHP at bootup or it's never going to be otherwise dynamically added.

<?php
if ( function_exists( 'utf8_encode' ) ) {
        require_once __DIR__ . '/wp-includes/utf8_encode_polyfill.php';
}

I think Pascal's point is that this would only work on PHP 9.x when the functions are actually removed, whereas:

Many WP site will be on PHP 8.2 and below for many years to come, continuing to experience these deprecation warnings. We need to address this now, not in X years when we raise the minimum PHP version to one where the functions were actually removed.

#65 @dmsnell
7 months ago

Many WP site will be on PHP 8.2 and below for many years to come, continuing to experience these deprecation warnings

I can understand where this would be a hassle, given that PHP will spit out those messages on the console for sites that are essentially working fine.

In that case, maybe we missed a silly third option, which is replace Core's use of these functions with a differently-named polyfill. The upsides of replacing calls to utf8_encode are clear, but the cost of making those changes isn't. If we would still have interest in what is essentially a zero-effort change, it seems like adding deprecated_utf8_encode() and deprecated_utf8_decode() could defer the maintenance work (valuable since the existing code is working more or less) while eliminating the warnings and also making these uses standout for more gradual and safe refactoring.

For transparency sake here I'm only sharing ideas that I think have possibly-valuable tradeoffs in order to meet an urgent need without forcing more costly and risky rewrites. I don't hold a strong opinion on this or what everyone decides.

#66 @jrf
7 months ago

Okay, I have a feeling this ticket is going off the rails, so let's see if we can get it back on track.

This ticket has a dual purpose:

  1. Fix the 4 instances of utf8[en|de]code() in WordPress Core.
  2. Put the tools in place to make it more straight-forward for plugin/theme developers to fix their uses of these functions by making a PHP extension a requirement for WordPress.

[1] can be addressed at any time prior to PHP 9.0, which is still nearly two years away, possibly longer.
Remember: deprecation notices are not errors, nothing is broken and nobody should run with E_DEPRECATED enabled in production.
So there is no urgency, other than an urgency to prevent the wrong solution from being committed.

[2] is - for WordPress Core - not essential to fix [1], but would:

  • inform whether an optimal/simple fix can be used for [1] or whether a slightly more complicated fix will need to be created for those four usages in WP Core.
  • Make it straight-forward for plugins/themes to fix their uses of utf8[en|de]code() as they can then rely on all solution directions, as outlined in the RFC, being available.
  • Allow for WordPress Core to fix other long-standing issues related to internationalization of websites.
  • Allow for removing some polyfills from WordPress Core.
  • Allow for preventing potential conflicts between plugins which both ship with MbString polfyfills (where the polyfill implementations could be different between plugins).
  • Allow for plugins, which need MbString, to simplify their code and rely on the extension being available.
  • etc etc

As there is time - basically four years or more between the opening of this ticket and PHP 9.0 -, this deprecation was a good trigger to have a discussion about [2], which would allow for structural improvement.

So far, this discussion has been going nowhere as nobody seems to dare to take a decision about this.

WordPress Core can get by without MbString (only polyfilling select functionality from the extension) and can just continue to ignore those long-standing internationalization issues. It won't block fixing the uses of utf8[en|de]code() in Core and it would maintain the status quo without introducing new functions.
It would just make life so much harder for the wider WP community if that's what ends up happening.

Keep in mind that the site counts for a 0.5% is not insignificant.

I fully recognize that, but keep in mind that those 0.5% will be English-language installs only and declining the proposal to make MbString a requirement for WordPress Core is tantamount to prioritizing that 0.5% of English-language installs over the > 50% of non-English language WordPress installs in the world, which makes a telling and concerning statement about how WordPress leadership views non-English-language installs as second rate citizens.

WordPress has already been recommending enabling the MbString extension for quite a while, polyfills select functions from the extension and both the GetID3 as well as the SimplePie external dependencies also have it as listed as a strongly suggested extension for their current versions.

In other words, what is needed for this ticket is a decision within the next two years. With a preference for taking such a decision sooner rather than later as that will allow more plugins/themes to be ready for PHP 9.0 in time.

What we don't need is avoiding the issue altogether by putting an awkwardly named polyfill in place, which would effectively result in the incorrect uses of these functions never getting fixed, as technical debt never gets priority in Core (unless forced by PHP/PHPUnit).

#67 in reply to: ↑ 61 @chesio
7 months ago

Replying to ironprogrammer:

Replying to swissspidy:

...have we considered using https://packagist.org/packages/symfony/polyfill-mbstring instead of making the mbstring extension mandatory?

This was suggested early on (comment:3), but even considering the relatively recent change of minimum PHP from 5.6 to 7.0, as noted in comment:4, the package requires PHP 7.1+.

It seems that minimum PHP version is going to be bumped to 7.2 in WordPress 6.6 (see #58719), so PHP version should no longer be a blocker for inclusion of symfony/polyfill-mbstring.

Btw. as someone whose given name contains non-ASCII, non-ISO-8859-1 character, I would set mbstring extension as requirement :-) I've got my given name mangled too many times, so I'm all in for having proper UTF-8 implementation everywhere.

This ticket was mentioned in PR #6298 on WordPress/wordpress-develop by @jrf.


7 months ago
#68

  • Keywords has-patch has-unit-tests added

One of the tests for the wp_kses_xml_named_entities() function used utf8_encode( chr( 160 ) ) to set an expectation of a Unicode character for a non breaking space.

It is understandable that this expectation was previously set this way, as it is not possible for a developer to distinguish between a _breaking_ space and a _non-breaking_ space visually, so the chances of the test accidentally breaking on an incorrect safe when the plain Unicode character would be used, was high.

However, the utf8_encode() function is deprecated as of PHP 8.2 and we need to remove its use from the WP codebase.

Now, PHP 7.0 happens to have introduced Unicode escape sequences, which allows to create a text string using unicode characters referenced by their codepoint and luckily WP has a minimum supported PHP version of PHP 7.0 nowadays.

By switching the test case to provide the test expectation using a Unicode escape sequence, we remove the use of the deprecated PHP function and still preserve the safeguard against the test accidentally breaking.

Trac ticket: https://core.trac.wordpress.org/ticket/55603
Trac ticket: https://core.trac.wordpress.org/ticket/60705

#69 @SergeyBiryukov
7 months ago

In 57861:

Tests: Remove unnecessary use of utf8_encode() in KSES tests.

One of the tests for the wp_kses_xml_named_entities() function used utf8_encode( chr( 160 ) ) to set an expectation of a Unicode character for a non-breaking space.

It is understandable that this expectation was previously set this way, as it is not possible for a developer to distinguish between a breaking space and a non-breaking space visually, so the chances of the test accidentally breaking on an incorrect save when the plain Unicode character would be used, was high.

However, the utf8_encode() function is deprecated as of PHP 8.2, and its use needs to be removed from the WP codebase.

PHP 7.0 has introduced Unicode escape sequences, which allows to create a text string using Unicode characters referenced by their codepoint. By switching the test case to provide the test expectation using a Unicode escape sequence, we remove the use of the deprecated PHP function and still preserve the safeguard against the test accidentally breaking.

Follow-up to [52229].

Props jrf, afercia, poena, SergeyBiryukov.
See #55603, #60705.

@SergeyBiryukov commented on PR #6298:


7 months ago
#70

Thanks for the PR! Merged in r57861.

#71 @swissspidy
7 months ago

In 57865:

Editor: Check if mb_strtolower exists before using it in the font library.

Prevents an error when uploading fonts on certain systems, because the mbstring extension can be missing and thus the function may not be available.

Props mujuonly, swissspidy, peterwilsoncc.
Fixes #60823.
See #55603.

#72 @swissspidy
7 months ago

In 57869:

Editor: Check if mb_strtolower exists before using it in the font library.

Prevents an error when uploading fonts on certain systems, because the mbstring extension can be missing and thus the function may not be available.

Reviewed by jorbin.
Merges [57865] to the to the 6.5 branch.

Props mujuonly, swissspidy, peterwilsoncc.
Fixes #60823.
See #55603.

#73 @hellofromTonya
6 months ago

  • Owner hellofromTonya deleted

As of today, I'm no longer a sponsored contributor and am starting a month or more AFK break. To not stand in the way, I'm clearing me as the ticket owner, which opens the ticket for someone to step in to shepherd it forward to resolution.

#74 @afercia
6 months ago

  • Description modified (diff)

Report the issue to GetID3

Looks like this action item is no longer necessary. GetID3 removed these functions starting from version 1.9.23 released on 2023-10-19. See https://github.com/JamesHeinrich/getID3/pull/402

The GetID3 new version is already included in WordPress 6.5, see [56975].

Last edited 6 months ago by SergeyBiryukov (previous) (diff)

#75 @afercia
6 months ago

@SergeyBiryukov, @poena, and I had a look at this ticket in a mob coding session. We do support @jrf proposal to require the mbstring extension as that appears to be the most future-proof way to address this issue.

We think a polyfill wouldn't be ideal because of the important point raised by @jrf:

As most uses of these functions are likely to be incorrect usage (especially in plugins), these "bugs" will remain and not be reviewed or addressed, undercutting the improvement PHP is trying to make.

Reminder:

  • 1 x utf8_encode() in tests/phpunit/tests/kses/php

This instance of utf8_encode() was already removed in [57861].

This ticket was mentioned in PR #6359 on WordPress/wordpress-develop by @SergeyBiryukov.


6 months ago
#76

Adds mbstring to the list of PHP extensions required by WordPress.

Trac ticket: https://core.trac.wordpress.org/ticket/55603

#77 @SergeyBiryukov
6 months ago

  • Owner set to SergeyBiryukov
  • Status changed from assigned to accepted

This ticket was mentioned in Slack in #core-performance by adamsilverstein. View the logs.


6 months ago

This ticket was mentioned in Slack in #core-media by adamsilverstein. View the logs.


6 months ago

#81 @peterwilsoncc
6 months ago

Catching up on this after quite some time.

I think it best to go with option 2 and require the extension.

I am quite persuaded by @jrf's comment on the English/Latin alphabet bias in not requiring the extension. With the wide use of emoji, it would be quite an advantage for countries with a Latin alphabet too 🙂

#82 follow-up: @knutsp
6 months ago

I'm not sure what requires mean, will an installation or upgrade be refused/fail, or will it just be a notice or test in Site Health?

En elegant solution would be to not allow non English installation or language addition in case mbstring is not included. That may require a kind of polyfill as a short circuit and avoid fatal errors, in case non Latin characters are introduced by user input.

I'm all for requiring mbstring in general, as we who are using European languages have extra characters in our alphabets, but I'm a bit worried about this getting accepted by project leadership in time this is needed for PHP versions support. And I admit that a very large portion of WordPress installations use a variant of English only, not only as the site language, but also for all users, and really don't need this.

#83 in reply to: ↑ 82 @peterwilsoncc
6 months ago

Replying to knutsp:

I'm not sure what requires mean, will an installation or upgrade be refused/fail, or will it just be a notice or test in Site Health?

An upgrade will be refused: during the upgrade process support is tested and if the module is not included the upgrade will not proceed.

For sites running 6.1 or later, the available modules are sent to the upgrade API so it can be modified not to present the upgrade to sites without the module installed. Such sites would only be offered minor version updates.

As Dion mentions in comment #55, although it's only 0.5% of sites that don't have the module installed, at WordPress's scale that represents a lot of sites.

En elegant solution would be to not allow non English installation or language addition in case mbstring is not included. That may require a kind of polyfill as a short circuit and avoid fatal errors, in case non Latin characters are introduced by user input.

I unsure about this, but may be convinced. It's possible that this would complicate matters further than the current practice of testing if multi-byte functions exist before using them.

I'm all for requiring mbstring in general, as we who are using European languages have extra characters in our alphabets, but I'm a bit worried about this getting accepted by project leadership in time this is needed for PHP versions support. And I admit that a very large portion of WordPress installations use a variant of English only, not only as the site language, but also for all users, and really don't need this.

It's one of those stats that can be interpreted either way.

According to the WordPress stats about half of sites use an English variant (US, UK, etc) and about half use a non-English language.

#84 @dmsnell
6 months ago

Just a quick note that multi-byte strings aren't exclusively about non-English locales or even absent in English locales.

Most European language groups have historically had their own single-byte encoding in the ISO-8859 family, but they are incompatible with each other. Unicode joins all of the recognized character sets into one large catalog and UTF-8 is the most universal way to represent that text in actual bytes.

Most modern English-only language online is full of multi-byte characters as well, including “smart quotes,” the ellipsis…, em — dash, and a host of Emoji 😀.

I think many in this thread are even too familiar with some of these complexities. For those who haven't come across the headaches of text encoding issues in English, you can be assured that they are there.

#85 @swissspidy
6 months ago

Just in addition to that, an English-only WordPress site can always install an additional language on the settings page.

So mking mbstring a requirement only for non-English sites is a bad idea.

#86 @hellofromTonya
4 months ago

  • Milestone changed from 6.6 to 6.7

The last scheduled beta will happen in a couple of hours. There's consensus, but likely needs more time for discussion and then changes. Moving to 6.7.

Note: See TracTickets for help on using tickets.