WordPress.org

Make WordPress Core

Opened 3 years ago

Closed 4 months ago

#18007 closed defect (bug) (fixed)

is_serialized trouble for multibytes encoding

Reported by: challet Owned by: nacin
Milestone: 3.9 Priority: normal
Severity: normal Version: 3.2
Component: Charset Keywords: has-patch
Focuses: Cc:

Description

Possible bug in the is_serialized function, in a multibytes environment :

Context

Retrieving an option from the database, the get_option function try to deserialize it.

Bug scenario

  • the strlen call is actually a mb_strlen one
  • the $length var is the number of characters (not bytes)
  • using $data[$length-1] don't retrieve the last char but one before, it's actual position is undefined, depending on the other content of the string
  • this $lastc is not ; or }
  • the string is understood has not serialized

Patch proposal

here is a fix i made to make it work with multibytes string :
Not tested it for single bytes, but there is no reason for it to not work.

--- a/wp-includes/functions.php
+++ b/wp-includes/functions.php
@@ -256,7 +256,7 @@ function is_serialized( $data ) {
                return false;
        if ( ':' !== $data[1] )
                return false;
-       $lastc = $data[$length-1];
+       $lastc = substr($data, -1);
        if ( ';' !== $lastc && '}' !== $lastc )
                return false;
        $token = $data[0];

Attachments (12)

18007.patch (417 bytes) - added by SergeyBiryukov 3 years ago.
18007_incomplete_unit_test.patch (3.1 KB) - added by kurtpayne 22 months ago.
18007.2.patch (1.2 KB) - added by SergeyBiryukov 21 months ago.
18007.2.alt.patch (647 bytes) - added by SergeyBiryukov 21 months ago.
18007.3.patch (1.9 KB) - added by SergeyBiryukov 21 months ago.
18007.unit-test.patch (2.0 KB) - added by SergeyBiryukov 21 months ago.
18007.unit-test.2.patch (2.0 KB) - added by SergeyBiryukov 21 months ago.
More accurate check for mbstring.func_overload
18007.4.patch (1.4 KB) - added by SergeyBiryukov 21 months ago.
18007.5.patch (835 bytes) - added by SergeyBiryukov 21 months ago.
18007.5.alt.patch (921 bytes) - added by SergeyBiryukov 21 months ago.
18007.6.patch (461 bytes) - added by SergeyBiryukov 5 months ago.
18007.2.refreshed.patch (1.9 KB) - added by SergeyBiryukov 4 months ago.

Download all attachments as: .zip

Change History (30)

SergeyBiryukov3 years ago

comment:1 SergeyBiryukov3 years ago

  • Keywords has-patch added

Turned challet's proposal into a patch.

comment:2 puffythepirateboy3 years ago

After applying this fix on a multi-network installation (thousands of sites), the problem we encountered with multibyte characters in text widgets (among others) was resolved. It has been running fine for about a month now.

comment:3 Londeren2 years ago

This bug still reproduced. WordPress 3.3.1

comment:4 SergeyBiryukov2 years ago

  • Keywords needs-unit-tests added

comment:5 follow-up: kurtpayne22 months ago

  • Cc kurtpayne added

I'm trying to come up with a test for this, but I'm hitting some roadblocks.

I can't reproduce the issue. Is it still present in php 5.3?

I am doing the following:

  • ALTER TABLE wp_options MODIFY option_value LONGTEXT CHARACTER SET utf8 COLLATE utf8_unicode_ci;
  • INSERT INTO wp_options (option_name, option_value, autoload) VALUES ('test_18007_option', 'a:0:{}', 'yes');
  • get_option( 'test_18007_option' );

This returns the correct value ... an empty array.

Also, mbstring.func_overload is a PHP_INI_PERDIR | PHP_INI_SYSTEM level configuration item for php 4.3 through php 5.2.6. This is a little tricky since this has to be set with php -d or in php.ini (or skip the test).

comment:6 in reply to: ↑ 5 challet22 months ago

Replying to kurtpayne:

I am doing the following:

  • ALTER TABLE wp_options MODIFY option_value LONGTEXT CHARACTER SET utf8 COLLATE utf8_unicode_ci;
  • INSERT INTO wp_options (option_name, option_value, autoload) VALUES ('test_18007_option', 'a:0:{}', 'yes');
  • get_option( 'test_18007_option' );

This returns the correct value ... an empty array.

You should try something like :

`INSERT INTO `wp_options` (`option_name`, `option_value`, `autoload`) VALUES ('test_18007_option', 'a:0:{"text": "aéùp"}', 'yes');`

The shift between the length (in bytes) and the length (in charcters number) happens when there is indeed a character using multiple bytes.

This is a little tricky

It is :)

Version 0, edited 22 months ago by challet (next)

comment:7 follow-up: kurtpayne22 months ago

I still can't recreate this. I've downgraded to php 5.2.17, 5.2.9, and 5.2.6 and tried this, but I can't recreate the original bug.

I'm attaching the unit test I'm working on, but I'm stuck until I get some feedback. You have to apply it then run it as:

php -d mbstring.func_overload=4 /usr/bin/phpunit -c mbstring.xml

SergeyBiryukov21 months ago

SergeyBiryukov21 months ago

comment:8 in reply to: ↑ 7 SergeyBiryukov21 months ago

I can reproduce the original bug.

18007.2.patch is a more complete patch using substr().

18007.2.alt.patch is another take using count( str_split() ) instead of strlen().

Unfortunately, a quick test with 3 000 000 iterations of is_serialized() showed that substr() is about 2 times slower than direct access, and count( str_split() ) is 3 times slower than strlen():

18007.3.patch is an attempt to only use substr() if mbstring.func_overload is enabled, and to keep decent performance otherwise:

We have a similar check in streams.php:
http://core.trac.wordpress.org/browser/tags/3.4.2/wp-includes/pomo/streams.php#L17

Replying to kurtpayne:

I'm attaching the unit test I'm working on, but I'm stuck until I get some feedback. You have to apply it then run it as:

php -d mbstring.func_overload=4 /usr/bin/phpunit -c mbstring.xml

Note that mbstring.func_overload=4 only overloads regular expression functions. For string functions, you need mbstring.func_overload=2.

18007.unit-test.patch is the shortened and revised unit test.

Last edited 21 months ago by SergeyBiryukov (previous) (diff)

comment:9 kurtpayne21 months ago

  • Keywords 3.6-early added; needs-unit-tests removed
  • Milestone changed from Awaiting Review to Future Release

Original bug confirmed using mbstring.func_overload = 2

18007.3.patch looks good and in line with the precedent set in pomo/streams.php. I did not run my own benchmarks; I trust the numbers provided.

18007.unit-test.patch fails without patch, passes with patch.

SergeyBiryukov21 months ago

More accurate check for mbstring.func_overload

SergeyBiryukov21 months ago

SergeyBiryukov21 months ago

comment:10 SergeyBiryukov21 months ago

More takes in 18007.4.patch, 18007.5.patch and 18007.5.alt.patch.

I didn't test all the cases yesterday, here's the full table. Increased the number of iterations to 10,000,000 to make the results more accurate.

mbstring.func_overload offmbstring.func_overload on
62,783s — unpatched36,596s — unpatched
63,268s — 18007.5.patch67,948s — 18007.5.patch
64,873s — 18007.2.patch75,975s — 18007.2.patch
71,176s — 18007.4.patch76,906s — 18007.5.alt.patch
73,531s — 18007.3.patch86,326s — 18007.4.patch
76,352s — 18007.5.alt.patch88,115s — 18007.3.patch
96,569s — 18007.6.patch100,632s — 18007.2.alt.patch
99,088s — 18007.2.alt.patch453,753s — 18007.6.patch

It turned out that 18007.2.patch is actually faster than 18007.3.patch in both cases. 18007.5.patch shows the best performance, but probably looks less obvious.

Not sure why the unpatched results differ by 75%, but the difference is consistent after several takes.

Last edited 5 months ago by SergeyBiryukov (previous) (diff)

comment:13 follow-up: SergeyBiryukov11 months ago

[17592] offers another approach using mb_internal_encoding(), need to include it in the performance tests.

comment:14 SergeyBiryukov5 months ago

  • Milestone changed from Future Release to 3.9

A simple way to reproduce the issue:

  1. Set mbstring.func_overload = 2 in php.ini.
  2. Enter a widget title in Cyrillic and save it.
  3. The widget shows a notice (see tags/3.8.1/src/wp-includes/widgets.php#L262):
    Notice: Undefined offset: 2 in wp-includes/widgets.php on line 262
    
  4. The widget disappears on front-end. After a page refresh, it disappears in the admin too.

18007.2.patch seems like the way to go here. Let's refresh the unit tests if necessary and get this in.

SergeyBiryukov5 months ago

comment:15 in reply to: ↑ 13 SergeyBiryukov5 months ago

Replying to SergeyBiryukov:

[17592] offers another approach using mb_internal_encoding(), need to include it in the performance tests.

Implemented in 18007.6.patch, see the updated table in comment:10.

It turned out to be too slow, so 18007.2.patch is still the way to go.

comment:16 nacin4 months ago

  • Keywords needs-refresh added; 3.6-early removed

Patch needs refreshing.

comment:17 SergeyBiryukov4 months ago

  • Keywords needs-refresh removed

Refreshed.

comment:18 nacin4 months ago

  • Owner set to nacin
  • Resolution set to fixed
  • Status changed from new to closed

In 27565:

In is_serialized(), use substr() rather than array access, for compatibility with multibyte overloading.

props SergeyBiryukov.
fixes #18007.

Note: See TracTickets for help on using tickets.