#61680 closed defect (bug) (fixed)
is_utf8_charset() undefined when called by code in compat.php
Reported by: | dmsnell | Owned by: | jorbin |
---|---|---|---|
Milestone: | 6.6.1 | Priority: | high |
Severity: | normal | Version: | 6.6 |
Component: | General | Keywords: | has-patch dev-reviewed commit fixed-major |
Focuses: | Cc: |
Description
#61182 introduced is_utf8_charset()
as a way of standardizing checks for charset slugs referring to UTF-8. This is called by _mb_strlen()
inside of compat.php
, but is_utf8_charset()
is defined in functions.php
, which isn't loaded early on.
Code calling mb_strlen()
early on before functions.php
loads in hosts without the multibyte extension therefore may crash. This was reported in wp-super-cache
jetpack report.
Change History (17)
This ticket was mentioned in PR #7052 on WordPress/wordpress-develop by @dmsnell.
6 months ago
#1
- Keywords has-patch added; needs-patch removed
6 months ago
#2
Thanks @joemcgill - I also noted the apparent call to get_option()
but I didn't want to address it in this bug-fix. I could see value in calling function_exists()
first, but also I don't see the need to perform this check at runtime when mostly they will always exist.
My initial reaction was that perhaps we could create a new CI task to check if anything in compat()
is calling code that hasn't yet been defined. This, of course, would flag the mb_
functions 🤷♂️
In any case the situations where these things run is rare enough. I would imagine that we could get away with the runtime checks and have it not matter much. IMO, good work to discuss in an enhancement.
This ticket was mentioned in Slack in #core by hellofromtonya. View the logs.
6 months ago
@joemcgill commented on PR #7052:
6 months ago
#4
I also noted the apparent call to
get_option()
but I didn't want to address it in this bug-fix.
Totally agree that handling that separately makes sense.
I could see value in calling
function_exists()
first, but also I don't see the need to perform this check at runtime when mostly they will always exist.
This is the tricky part. While I appreciate not calling function_exists()
when it's not needed in a majority of cases, it seems like the alternative is to either accept the potential fatals (expecting third party code that trigger these fatals to fix the issue) or do what you have done here, which is to create additional compat shims for those functions that splits out parts of the implementation here. For maintainability, I have some reservations to creating private stateless versions of any WP functions that are called from this file.
6 months ago
#5
Thanks @joemcgill
For maintainability, I have some reservations to creating private stateless versions of any WP functions that are called from this file.
Would you care to share more about this? From my perspective, fixing this makes the code better by introducing this split, particularly since is_utf8_charset()
is calling the more testable unit.
The only complication I see is that it's in compat.php
, but then again, it's addressing some very low-level stuff required by other compatability code.
I considered duplicating the logic of this function here directly, but given that it was necessary in both of the _mb_substr()
and _mb_strlen()
functions, I thought it was more susceptible to problems than creating a new function would leave them.
---
Do you see this as a blocker to the patch as proposed?
---
Without trying to derail this conversation, I'm only more convinced each week that a good future direction for Core is UTF-8 everywhere, and to that end, some UTF-8 functionality is missing, and there have been many problems lately related to whether the multibyte extension is available.
I've been exploring a native UTF-8 handling system in #6883 and think it has merit. Similar to how the HTML API has replaced in user-space some PHP-provided functionality, incorporating our own UTF-8 encoding and decoding would let us finally get rid of a lot of needless and inconsistent checks and conversions.
I've been exploring some of this code inside Automattic too. When I feel it's ready for actual discussion I'll start posting on Make about it.
This ticket was mentioned in Slack in #core by hellofromtonya. View the logs.
6 months ago
@joemcgill commented on PR #7052:
6 months ago
#7
Would you care to share more about this?
Sure! Besides the use of an option value, is_utf8_charset()
is a pretty simple function currently. Splitting the implementation into two different functions which exist in two separate files could be a bit confusing for future maintainers.
I don't see that maintainability concern as a blocker.
However, the other problem that I just noticed when looking at this again, is that now any time the _mb_substr()
and _mb_strlen()
polyfills are used, they will not be able to make use of the site's option setting, even though in a majority of cases, the need to avoid the get_option()
call is unnecessary and unexpected. That seems like a bigger problem. That does seem like something we want to avoid.
6 months ago
#8
I think there are two points (in addition to the UTF-8 everwhere one):
1) We need to figure out a way to prevent issues like this in the future.
2) The fatal at hand.
I think we should focus on 2 right now and create a separate core ticket and one where we can at a minimum include some documentation at the top of the file and ideally include some automated tests as well.
6 months ago
#9
Follow up ticket: https://core.trac.wordpress.org/ticket/61694
This ticket was mentioned in Slack in #core by hellofromtonya. View the logs.
6 months ago
#11
@
6 months ago
- Keywords dev-feedback added
:facepalm: referenced the wrong ticket in the commit. Copied over below. Also keeping open for backport consideration.
---
In [58763]:
General: Provide _is_utf8_charset() in compat.php for early use
#61182 introduced is_utf8_charset() as a way of standardizing checks for charset slugs referring to UTF-8. This is called by _mb_strlen() inside of compat.php, but is_utf8_charset() is defined in functions.php, which isn't loaded early on. Code calling mb_strlen() early on before functions.php loads in hosts without the multibyte extension therefore may crash.
Props dmsnell, jonsurrell, joemcgill, jorbin.
Fixes #61681.
#12
@
6 months ago
- Keywords dev-reviewed commit added; dev-feedback removed
[58763] LGTM for backporting to the 6.6 branch.
#13
@
6 months ago
However, the other problem that I just noticed when looking at this again, is that now any time the _mb_substr() and _mb_strlen() polyfills are used, they will not be able to make use of the site's option setting, even though in a majority of cases, the need to avoid the get_option() call is unnecessary and unexpected. That seems like a bigger problem. That does seem like something we want to avoid.
@joemcgill thanks for sharing more. please note, however, that this patch does not change the behavior of the polyfills for blog_charset
. In compat.php
they make the call directly if no charset is provided. Before and after this change, the behavior is identical: if no charset is provided before functions.php
is loaded, the call will crash; otherwise it will default to calling get_option()
.
Splitting the implementation into two different functions which exist in two separate files could be a bit confusing for future maintainers.
I hear this. It was difficult for me to see a clear dominant solution among the options. Even leaving things as they were was obviously a maintenance issue and the motivation for the function was that Core maintained multiple sets of differing rules for the same intention.
The wrapper now is at least a single line where the only thing it does is default to trying to infer the blog charset. If we can deprecate non-UTF-8 charsets inside of WordPress (ignoring any data stored in the database for now), then we can eliminate far more of these things which are scattered all over.
We need to figure out a way to prevent issues like this in the future…follow-up ticket
Thanks @jorbin - this was my plan as well. I think that a CI job would be most capable here of detecting these things. Even though I still support adding documentation at the top of some file, I've seen a very low effectiveness of that guard, as developers tend to jump into the middle of the file and never look at the top.
Thanks all for taking care of merging this and getting it into the release.
#14
@
6 months ago
- Owner set to jorbin
- Resolution set to fixed
- Status changed from new to closed
In 58764:
#16
@
6 months ago
please note, however, that this patch does not change the behavior of the polyfills for blog_charset. In compat.php they make the call directly if no charset is provided. Before and after this change, the behavior is identical: if no charset is provided before functions.php is loaded, the call will crash; otherwise it will default to calling get_option().
Thanks for pointing out that get_option()
is called earlier, I missed that when I made the observation about a potential change in behavior, which resolves my concerns with [58763].
I'll pick up the reset of the conversation in the follow-up ticket (#61694).
Trac ticket: Core-61680
When
is_utf8_charset()
was introduced, themb_strlen()
andmb_substr()
compat functions were modified to call it, but they are defined incompat.php
beforeis_utf8_charset()
is defined infunctions.php
.Certain code calling these compat functions early in the boot process before
functions.php
is included and on hosts without the multi-byte extension would thus crash.In this patch the
is_utf8_charset()
function is split into pure and stateful components. The pure version is recreated as_is_utf8_charset()
and defined incompat.php
while the existing function (which defaults to callingget_option( 'blog_charset' )
) is left in place infunctions.php
. This ensures that code calling it will be able to call a form of the function even in early sequences.Follow-up to [58169].
Props dmsnell, donncha, hellofromTonya, jeherve, slyall, spacedmonkey.
Fixes #61680.