Make WordPress Core

Opened 11 years ago

Closed 11 years ago

Last modified 8 years ago

#25259 closed defect (bug) (fixed)

Helper functions for mbstring.func_overload

Reported by: dd32's profile dd32 Owned by: dd32's profile dd32
Milestone: 3.7 Priority: normal
Severity: normal Version:
Component: General Keywords:
Focuses: Cc:

Description

Similar to #25061, WP_HTTP_Streams doesn't account for mbstring.func_overload, and since it's also going to be an issue with WP_Filesystem in #25237 I've gone searching for a simpler solution.

Attached is a patch which introduces a helper function to set/reset the mbstring internal encoding, limited testing performed, but it appears to work as expected.
It's possible to call it multiple times (nested) as long as there are the same number of reset calls made.

An alternative solution, would be to set the mbstring internal encoding globally for WordPress, forcing it to our preferred character encoding, something that although I love the idea of at first, it may have significant impacts for users who actually rely upon it.

Commits where we've implemented workarounds: [25051] [17592]
It also appears the ID3 library doesn't like mbstring.func_overload, in addition to pomo having a set of workarounds for it.

Attachments (1)

25259.diff (4.4 KB) - added by dd32 11 years ago.

Download all attachments as: .zip

Change History (17)

@dd32
11 years ago

#1 @dd32
11 years ago

Attachment attachment:25259.diff​ added

This patch also moves the mbstring handling from the cURL callbacks, to the WP_HTTP::request() method, this allows it to apply the fix to all string calls used through out the HTTP class while not having to sprinkle calls around the transports liberally.

#2 @SergeyBiryukov
11 years ago

As noted ticket:25061:12, function_exists( 'ini_get' ) seems unnecessary, as we don't check for that anywhere else in core, other than in _load_image_to_edit_path() ([20806]), which I guess was a copy/paste from WP_Http_Streams::test(), from where it was removed in [25224].

#3 @dd32
11 years ago

function_exists( 'ini_get' ) seems unnecessary

I think we've phased it out over time, It was common as a disable_function on bad PHP4 hosts, and as a result was included often in checks previously.

We can probably remove the other remaining case in _load_image_to_edit_path() which got it's check from the now-defunct PHP4 WP_HTTP_Fopen transport.

#4 @dd32
11 years ago

  • Owner set to dd32
  • Resolution set to fixed
  • Status changed from new to closed

In 25346:

Add a set of helpers to turn the behaviour of mbstring.func_overload off when needed. Fixes #25259

#5 @dd32
11 years ago

In 25347:

Switch unzip_file() over to using the mbstring.func_override helper functions. See #25259

#6 @dd32
11 years ago

In 25348:

Switch WP_HTTP over to using the mbstring.func_overload helper functions. This change moves the check from within the Streaming-handling function to wrap the individual request, this fixes it for both cURL and Streams and any future changes to the transports which use strlen() on binary data. See #25259 See #16057

#7 @dd32
11 years ago

In 25349:

Make use of the mbstring.func_overload helper functions in WP_Filesystem so byte lengths are properly determined. See #25259 Fixes #25237

#8 @nacin
11 years ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

I meant to comment here; it appears I didn't.

Would it make sense to have a function specifically for strlen() that does the switch, the measurement, and the restoration? This seems to be the most common use case in core. mb_substr() is used more, and mb_strtolower() is also used a few times. It just might make things a bit more clear what is going on. I'm fine with the helpers, but it isn't entirely obvious what is going on in between the two points. Some of the gaps are 10 to 25 lines, with other function calls in between — and potentially filters? We should aim to keep this as atomic as possible.

Reopening for review, especially given that this could have a side effect on other functions or plugins without thorough review.

#9 follow-up: @dd32
11 years ago

Would it make sense to have a function specifically for strlen() that does the switch, the measurement, and the restoration?

I considered that, but, certain parts of the code use a bunch of other things, and I'd rather be consistent rather than using 2 different work arounds.

class-ftp.php & class-pclzip.php for example needs ereg_replace/strlen/substr/strstr/stristr and a few others, as they're all operating on binary data, so a full switch needs to take place to remain compatible there.

For things like WP_HTTP, strlen and substr are only needed (at present), and aside from the ftpsocketsWP_Filesystem transport, the rest only really need strlen at present.

For those cases, we could do a compat function that looks something like:

function wp_strlen( $string ) {
  return function_exists( 'mb_strlen' ) ? mb_strlen( $string, 'ascii' ) : strlen( $string );
}
function wp_substr( $string, $start, $length ) {
  return function_exists( 'mb_substr' ) ? mb_substr( $string, $start, $length, 'ascii' ) : substr( $string, $start, $length );
}

With the commit as-is, On most hosts these helpers will do absolutely nothing, it'll just be an extra function call, with a if ( static var is false ) return, on the rest of the hosts who have this stupid setting enabled, it'll use the fix we've been using for a while elsewhere..

#10 in reply to: ↑ 9 ; follow-up: @SergeyBiryukov
11 years ago

Replying to dd32:

For those cases, we could do a compat function that looks something like:

FWIW, we already have a compat version of mb_substr(): tags/3.6.1/wp-includes/compat.php#L16.

#11 in reply to: ↑ 10 @dd32
11 years ago

Replying to SergeyBiryukov:

Replying to dd32:

For those cases, we could do a compat function that looks something like:

FWIW, we already have a compat version of mb_substr(): tags/3.6.1/wp-includes/compat.php#L16.

I guess the function_exists() check would have to either check for a different mbstring function, or, now that i look at it, simply calling it with an ascii encoding would always work (as if mbstring isn't loaded, it'll use substr(), otherwise, it'll use mb_substr with ascii)

#12 follow-up: @pbaylies
11 years ago

  • Cc pbaylies added

On the topic of function_exists() checks, mbstring is a non-default extension; so you'll want to be doing function_exists() checks for mbstring_binary_safe_encoding() and reset_mbstring_encoding() here as well (this is currently broken in trunk for me as of [25348]).

#13 in reply to: ↑ 12 @SergeyBiryukov
11 years ago

Replying to pbaylies:

On the topic of function_exists() checks, mbstring is a non-default extension; so you'll want to be doing function_exists() checks for mbstring_binary_safe_encoding() and reset_mbstring_encoding() here as well (this is currently broken in trunk for me as of [25348]).

There is a function_exists( 'mb_internal_encoding' ) check in mbstring_binary_safe_encoding(): trunk/src/wp-includes/functions.php?rev=25400#L4178.

#14 @pbaylies
11 years ago

SergeyBiryukov,

I'm not sure why those functions weren't defined for me previously then, but it all appears to be fine now; thanks!

#15 @nacin
11 years ago

  • Resolution set to fixed
  • Status changed from reopened to closed

This ticket was mentioned in Slack in #forums by sergey. View the logs.


8 years ago

Note: See TracTickets for help on using tickets.