WordPress.org

Make WordPress Core

Opened 5 years ago

Last modified 3 years ago

#9930 reopened defect (bug)

is_serialized() returns false on serialized doubles

Reported by: vladimir_kolesnikov Owned by: westi
Milestone: Future Release Priority: normal
Severity: minor Version:
Component: General Keywords: has-patch
Focuses: Cc:

Description

Test case:

<?php
    require_once('wp-load.php');
    print var_export(is_serialized(serialize(1.2E+150)));
?>

Expected: true
Got: false

serialize(1.2E+150) returns something like 'd:1.200000000000000013344651621705194036153934411236609269391465806550823148718924258603522328009361549E+150;', the plus sign after 'E' is not taken into account by the regexp

Attachments (7)

functions.php.diff (900 bytes) - added by vladimir_kolesnikov 5 years ago.
9930.diff (417 bytes) - added by Denis-de-Bernardy 4 years ago.
optimize-is-serialized.diff (473 bytes) - added by Denis-de-Bernardy 4 years ago.
untested
9930.2.diff (665 bytes) - added by Denis-de-Bernardy 4 years ago.
9930.patch (1.9 KB) - added by hakre 4 years ago.
9930.2.patch (2.3 KB) - added by hakre 4 years ago.
Refreshed against trunk
9930.3.patch (469 bytes) - added by hakre 3 years ago.
Refreshed against trunk, added + and INF

Download all attachments as: .zip

Change History (44)

comment:1 vladimir_kolesnikov5 years ago

  • Cc vladimir@… added
  • Keywords has-patch added

comment:2 follow-up: Denis-de-Bernardy5 years ago

  • Milestone set to 2.8

you sure you changed the correct function? it doesn't seem to answer true only only on strings now.

comment:3 in reply to: ↑ 2 vladimir_kolesnikov5 years ago

Replying to Denis-de-Bernardy:

you sure you changed the correct function? it doesn't seem to answer true only only on strings now.

Not sure I understood your question (English is not my native language) but:

  1. is_serialize() returns false if $data is not a string;
  2. the regexp is three in one (N;, [aOs]:[0-9]+:.*[;}], [bid]:[0-9.E+-]+;), the third regexp contains '+', since serialized doubles can contain '+' after 'E'

Test case:

<?php
    require_once('wp-load.php');

    function is_serialized2($data)
    {
        return is_string($data) && preg_match('/^((N;)|([aOs]:[0-9]+:.*[;}])|([bid]:[0-9.E+-]+;))$/s', $data);
    }

    print var_export(is_serialized2(serialize("a\nb"))) . "\n";
    print var_export(is_serialized2(serialize(-25))) . "\n";
    print var_export(is_serialized2(serialize(25))) . "\n";
    print var_export(is_serialized2(serialize(false))) . "\n";
    print var_export(is_serialized2(serialize(null))) . "\n";
    print var_export(is_serialized2(serialize(array()))) . "\n";
    print var_export(is_serialized2(serialize(1.1))) . "\n";
    print var_export(is_serialized2(serialize(2.1E+200))) . "\n";

    $x = new stdClass();
    $x->a = 1;
    $x->b = 'test';
    print var_export(is_serialized2(serialize($x))) . "\n";
?>

Expected result:

true
true
true
true
true
true
true
true
true

comment:4 follow-up: Denis-de-Bernardy5 years ago

yaya, it's just that, the patch seems to change the is_serialized_string() function, rather than is_serialized() (or maybe_is_serialized(), or whatever it's called).

based on the phpdoc of the function you changed, this is expected:

print var_export(is_serialized_string(serialize("a\nb"))) . "\n"; // true
print var_export(is_serialized_string(serialize(-25))) . "\n"; // false
print var_export(is_serialized_string(serialize(25))) . "\n"; // false
print var_export(is_serialized_string(serialize(false))) . "\n"; // false
print var_export(is_serialized_string(serialize(null))) . "\n"; // false
print var_export(is_serialized_string(serialize(array()))) . "\n"; // false
print var_export(is_serialized_string(serialize(1.1))) . "\n"; // false
print var_export(is_serialized_string(serialize(2.1E+200))) . "\n"; // false

comment:5 in reply to: ↑ 4 vladimir_kolesnikov5 years ago

Replying to Denis-de-Bernardy:

yaya, it's just that, the patch seems to change the is_serialized_string() function, rather than is_serialized() (or maybe_is_serialized(), or whatever it's called).

Yes, you are right, I need to be more attentive :-(

Of course the patch was meant for is_serialized() function, resubmitting.

comment:6 westi5 years ago

  • Owner set to westi
  • Status changed from new to reviewing

comment:7 Denis-de-Bernardy5 years ago

  • Milestone changed from 2.8 to 2.9

Patch changes the original regex on bad ions (the second set no longer contains \$ and the first set uses $ instead).

Punting per discussion during the WP meet-up...

comment:8 Denis-de-Bernardy5 years ago

  • Keywords needs-patch added; has-patch removed

comment:9 westi4 years ago

  • Keywords needs-unit-tests added
  • Milestone changed from 2.9 to 3.0

Move to 3.0 for now.

comment:10 follow-up: dd324 years ago

change that to:

return is_string($data) && preg_match('/^(N;)|([aOs]:[0-9]+:.*[;}])|([bid]:[0-9.E+-]+;)$/s', $data);

ie. Add the s modifier and it passes this test case for me.

	function test_is_serialized() {
		$cases = array(
					   serialize("a\nb"),
					   serialize(-25),
					   serialize(25),
					   serialize(false),
					   serialize(null),
					   serialize(array()),
					   serialize(1.1),
					   serialize(2.1E+200),
					   serialize( (object)array('test' => true, '3', 4) )
					   );

		foreach ( $cases as $case )
			$this->assertTrue( is_serialized($case) );

	}

comment:11 westi4 years ago

Thanks!

I've added this test cases to wordpress-tests.

comment:12 in reply to: ↑ 10 westi4 years ago

  • Keywords has-patch added; needs-patch needs-unit-tests removed

Replying to dd32:

change that to:

return is_string($data) && preg_match('/^(N;)|([aOs]:[0-9]+:.*[;}])|([bid]:[0-9.E+-]+;)$/s', $data);

ie. Add the s modifier and it passes this test case for me.

Looks good to me to.

comment:13 westi4 years ago

  • Resolution set to fixed
  • Status changed from reviewing to closed

(In [12485]) Much improved is_serialized() which catches serialized doubles. Fixes #9930 props vladimir_kolesnikov and dd32.

Denis-de-Bernardy4 years ago

comment:14 Denis-de-Bernardy4 years ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

comment:15 Denis-de-Bernardy4 years ago

re-opening with a new patch. the new regex makes it much slower, and it can run into backtracking problems with large strings.

if we are to keep it, this:

^(N;)|([aOs]:[0-9]+:.*[;}])|([bid]:[0-9.E+-]+;)$

should be:

^(?:N;|[aOs]:[0-9]+:.*[;}]|[bid]:[0-9.E+-]+;)$

else is_serialized may return true for garbage.

Denis-de-Bernardy4 years ago

untested

comment:16 Denis-de-Bernardy4 years ago

I've added a separate patch to optimize the other regexp in the function.

comment:17 Denis-de-Bernardy4 years ago

  • Severity changed from minor to critical

as of r12485:

var_dump(is_serialized('garbage:a:0:garbage;')); // true

comment:18 westi4 years ago

  • Keywords needs-patch added; has-patch removed
  • Severity changed from critical to minor

I've reverted the slow and broken change in [12486] and added the extra test case to wordpress-tests.

Lets get a better faster combined patch worked up and then we can commit.

Denis-de-Bernardy4 years ago

comment:19 Denis-de-Bernardy4 years ago

k... 9930.2.diff does the following:

  • it extends the second regex so that it catches doubles like 1E+1.
  • it reduces backtracking for the second regex

more specifically for the second:

  • an anchored regexp is very fast, unless it needs to match the entire string (as it currently does)
  • substr() is very fast, and it's trivial to see if a string has one out of two values

so by splitting the test in two, we get the same semantics and we make it faster and less resource hungry.

comment:20 dd324 years ago

The tests need some non-serialized cases too. I was going to write some, but needed a chance to work some out.. Didnt realise that regex had that huge hole.. :)

comment:21 Denis-de-Bernardy4 years ago

the current is_serialized() also catches "some" garbage, btw. specifically, each of the following should ideally get rejected:

b:E;
i:E;
d:E;
a:1::;

however, these seem to be very acceptable edge cases that we can live with: making the regexp more precise in order to validate the serialized data seems overkill given the number of times we call the function -- the application's efficiency would suffer.

comment:22 Denis-de-Bernardy4 years ago

  • Keywords has-patch added; needs-patch removed

comment:23 Denis-de-Bernardy4 years ago

@westi: 9930.diff fixes the original bug. the optimization for strings, arrays and objects is completely optional.

comment:24 hakre4 years ago

Better properly check serialized strings, we will see more exploits on serialized data in the future.

hakre4 years ago

comment:25 hakre4 years ago

  • Keywords tested added

I put this on my list again. Uploaded patch provides a fix for the problem. Tested.

This is how I tested:

Code:

$value      = 1.2E+150; 
$serialized = serialize($value);
var_dump(is_serialized($serialized), $serialized, $value);

Output (patched):

boolean true

string 'd:1.200000000000000013344651621705194036153934411236609269391465806550823148718924258603522328009361549E+150;' (length=109)

float 1.2E+150

Additionally some comment and code improvements.

comment:26 hakre4 years ago

Could run my patch (9930.patch) against the official testsuite now:

Before:

2) TestFunctions::test_is_serialized
Serialized data: d:2.099999999999999970432388186544853801030102097813541764707036304487326901911060431043950937593374825E+200;
Failed asserting that <boolean:false> is true.

After:
ZERRO failures in TestFunctions::test_is_serialized.

comment:27 Denis-de-Bernardy4 years ago

9930.patch has an over-optimization:

/([abdiOs]:|N;)/

should remain as:

/([abdiOs]):/

else badions contains garbage.

comment:28 Denis-de-Bernardy4 years ago

  • Keywords needs-patch added; has-patch tested removed

comment:29 Denis-de-Bernardy4 years ago

  • Keywords has-patch added; needs-patch removed

woops, nevermind my last comment. I was getting mixed up the array/string notation using [].

comment:30 Denis-de-Bernardy4 years ago

  • Keywords 2nd-opinion added

works, but do we really want to to introduce this amount of backtracking?

array could just as well use:

preg_match( '/^a:[0-9]+:/', $data ); 

object could just as well use:

preg_match( '/^O:[0-9]+:"[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*":[0-9]+:', $data );

and the string case could use something like:

substr($data, -2, 2) == '";' && preg_match( '/^s:[0-9]+:"/', $data );

comment:31 nacin4 years ago

  • Keywords needs-patch early needs-unit-tests added; has-patch 2nd-opinion removed
  • Milestone changed from 3.0 to 3.1

comment:32 hakre4 years ago

Related: #14429

hakre4 years ago

Refreshed against trunk

comment:33 hakre4 years ago

  • Keywords has-patch added; needs-patch needs-unit-tests removed

comment:34 nacin3 years ago

  • Keywords needs-refresh added; has-patch early removed
  • Milestone changed from Awaiting Triage to Future Release

comment:35 hakre3 years ago

If someone is looking for a regex that is validating a whole float/double value, this one is pretty strict so far:

(d:(?:[-]?INF|[+-]?(?:(?:[0-9]+|(?:[0-9]*[\.][0-9]+)|(?:[0-9]+[\.][0-9]*))|(?:[0-9]+|(?:([0-9]*[\.][0-9]+)|(?:[0-9]+[\.][0-9]*)))[eE][+-]?[0-9]+));)

It takes care not only of the scientific notations but as well for INF and -INF. If you need to get a float to be serialized for a tests with INF, this did work for me without caring about a systems float size:

   $float = INF; // or -INF
   $serialized = serialize($float);

For this ticket's traction I think it now depends on whether or not this should be fixed as it might imply performance issues. But the actual performance should be tested before making assumptions as my impression is, that PHP is pretty fast with regular expression when it calls for the same multiple times per request. This is the case for the is_serialized function.

comment:36 hakre3 years ago

  • Keywords has-patch added; needs-refresh removed

Part of this issue has been taken care of in #14429 which already changed the is_serialized function to some good.

the INF/-INF part looks like still missing. I'll add a patch against trunk.

hakre3 years ago

Refreshed against trunk, added + and INF

comment:37 hakre3 years ago

Refreshed against trunk HEAD, added +, I, N, F and e which are valid within serialized float values in PHP.

Related: #17375

Note: See TracTickets for help on using tickets.