WordPress.org

Make WordPress Core

Opened 9 years ago

Last modified 3 years ago

#9930 reopened defect (bug)

is_serialized() returns false on serialized doubles

Reported by: vladimir_kolesnikov Owned by: westi
Milestone: Future Release Priority: normal
Severity: minor Version:
Component: General Keywords: needs-refresh dev-feedback needs-unit-tests
Focuses: Cc:

Description

Test case:

<?php
    require_once('wp-load.php');
    print var_export(is_serialized(serialize(1.2E+150)));
?>

Expected: true Got: false

serialize(1.2E+150) returns something like 'd:1.200000000000000013344651621705194036153934411236609269391465806550823148718924258603522328009361549E+150;', the plus sign after 'E' is not taken into account by the regexp

Attachments (9)

functions.php.diff (900 bytes) - added by vladimir_kolesnikov 9 years ago.
9930.diff (417 bytes) - added by Denis-de-Bernardy 8 years ago.
optimize-is-serialized.diff (473 bytes) - added by Denis-de-Bernardy 8 years ago.
untested
9930.2.diff (665 bytes) - added by Denis-de-Bernardy 8 years ago.
9930.patch (1.9 KB) - added by hakre 8 years ago.
9930.2.patch (2.3 KB) - added by hakre 8 years ago.
Refreshed against trunk
9930.3.patch (469 bytes) - added by hakre 7 years ago.
Refreshed against trunk, added + and INF
9930-unittests.diff (1.1 KB) - added by MikeHansenMe 4 years ago.
see #30284
9930-unitttests.diff (731 bytes) - added by boonebgorges 4 years ago.
See #30284.

Download all attachments as: .zip

Change History (48)

#1 @vladimir_kolesnikov
9 years ago

  • Cc vladimir@… added
  • Keywords has-patch added

#2 follow-up: @Denis-de-Bernardy
9 years ago

  • Milestone set to 2.8

you sure you changed the correct function? it doesn't seem to answer true only only on strings now.

#3 in reply to: ↑ 2 @vladimir_kolesnikov
9 years ago

Replying to Denis-de-Bernardy:

you sure you changed the correct function? it doesn't seem to answer true only only on strings now.

Not sure I understood your question (English is not my native language) but:

  1. is_serialize() returns false if $data is not a string;
  2. the regexp is three in one (N;, [aOs]:[0-9]+:.*[;}], [bid]:[0-9.E+-]+;), the third regexp contains '+', since serialized doubles can contain '+' after 'E'

Test case:

<?php
    require_once('wp-load.php');

    function is_serialized2($data)
    {
        return is_string($data) && preg_match('/^((N;)|([aOs]:[0-9]+:.*[;}])|([bid]:[0-9.E+-]+;))$/s', $data);
    }

    print var_export(is_serialized2(serialize("a\nb"))) . "\n";
    print var_export(is_serialized2(serialize(-25))) . "\n";
    print var_export(is_serialized2(serialize(25))) . "\n";
    print var_export(is_serialized2(serialize(false))) . "\n";
    print var_export(is_serialized2(serialize(null))) . "\n";
    print var_export(is_serialized2(serialize(array()))) . "\n";
    print var_export(is_serialized2(serialize(1.1))) . "\n";
    print var_export(is_serialized2(serialize(2.1E+200))) . "\n";

    $x = new stdClass();
    $x->a = 1;
    $x->b = 'test';
    print var_export(is_serialized2(serialize($x))) . "\n";
?>

Expected result:

true
true
true
true
true
true
true
true
true

#4 follow-up: @Denis-de-Bernardy
9 years ago

yaya, it's just that, the patch seems to change the is_serialized_string() function, rather than is_serialized() (or maybe_is_serialized(), or whatever it's called).

based on the phpdoc of the function you changed, this is expected:

print var_export(is_serialized_string(serialize("a\nb"))) . "\n"; // true
print var_export(is_serialized_string(serialize(-25))) . "\n"; // false
print var_export(is_serialized_string(serialize(25))) . "\n"; // false
print var_export(is_serialized_string(serialize(false))) . "\n"; // false
print var_export(is_serialized_string(serialize(null))) . "\n"; // false
print var_export(is_serialized_string(serialize(array()))) . "\n"; // false
print var_export(is_serialized_string(serialize(1.1))) . "\n"; // false
print var_export(is_serialized_string(serialize(2.1E+200))) . "\n"; // false

#5 in reply to: ↑ 4 @vladimir_kolesnikov
9 years ago

Replying to Denis-de-Bernardy:

yaya, it's just that, the patch seems to change the is_serialized_string() function, rather than is_serialized() (or maybe_is_serialized(), or whatever it's called).

Yes, you are right, I need to be more attentive :-(

Of course the patch was meant for is_serialized() function, resubmitting.

#6 @westi
9 years ago

  • Owner set to westi
  • Status changed from new to reviewing

#7 @Denis-de-Bernardy
9 years ago

  • Milestone changed from 2.8 to 2.9

Patch changes the original regex on bad ions (the second set no longer contains \$ and the first set uses $ instead).

Punting per discussion during the WP meet-up...

#8 @Denis-de-Bernardy
9 years ago

  • Keywords needs-patch added; has-patch removed

#9 @westi
8 years ago

  • Keywords needs-unit-tests added
  • Milestone changed from 2.9 to 3.0

Move to 3.0 for now.

#10 follow-up: @dd32
8 years ago

change that to:

return is_string($data) && preg_match('/^(N;)|([aOs]:[0-9]+:.*[;}])|([bid]:[0-9.E+-]+;)$/s', $data);

ie. Add the s modifier and it passes this test case for me.

	function test_is_serialized() {
		$cases = array(
					   serialize("a\nb"),
					   serialize(-25),
					   serialize(25),
					   serialize(false),
					   serialize(null),
					   serialize(array()),
					   serialize(1.1),
					   serialize(2.1E+200),
					   serialize( (object)array('test' => true, '3', 4) )
					   );

		foreach ( $cases as $case )
			$this->assertTrue( is_serialized($case) );

	}

#11 @westi
8 years ago

Thanks!

I've added this test cases to wordpress-tests.

#12 in reply to: ↑ 10 @westi
8 years ago

  • Keywords has-patch added; needs-patch needs-unit-tests removed

Replying to dd32:

change that to:

return is_string($data) && preg_match('/^(N;)|([aOs]:[0-9]+:.*[;}])|([bid]:[0-9.E+-]+;)$/s', $data);

ie. Add the s modifier and it passes this test case for me.

Looks good to me to.

#13 @westi
8 years ago

  • Resolution set to fixed
  • Status changed from reviewing to closed

(In [12485]) Much improved is_serialized() which catches serialized doubles. Fixes #9930 props vladimir_kolesnikov and dd32.

#14 @Denis-de-Bernardy
8 years ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

#15 @Denis-de-Bernardy
8 years ago

re-opening with a new patch. the new regex makes it much slower, and it can run into backtracking problems with large strings.

if we are to keep it, this:

^(N;)|([aOs]:[0-9]+:.*[;}])|([bid]:[0-9.E+-]+;)$

should be:

^(?:N;|[aOs]:[0-9]+:.*[;}]|[bid]:[0-9.E+-]+;)$

else is_serialized may return true for garbage.

#16 @Denis-de-Bernardy
8 years ago

I've added a separate patch to optimize the other regexp in the function.

#17 @Denis-de-Bernardy
8 years ago

  • Severity changed from minor to critical

as of r12485:

var_dump(is_serialized('garbage:a:0:garbage;')); // true

#18 @westi
8 years ago

  • Keywords needs-patch added; has-patch removed
  • Severity changed from critical to minor

I've reverted the slow and broken change in [12486] and added the extra test case to wordpress-tests.

Lets get a better faster combined patch worked up and then we can commit.

#19 @Denis-de-Bernardy
8 years ago

k... 9930.2.diff does the following:

  • it extends the second regex so that it catches doubles like 1E+1.
  • it reduces backtracking for the second regex

more specifically for the second:

  • an anchored regexp is very fast, unless it needs to match the entire string (as it currently does)
  • substr() is very fast, and it's trivial to see if a string has one out of two values

so by splitting the test in two, we get the same semantics and we make it faster and less resource hungry.

#20 @dd32
8 years ago

The tests need some non-serialized cases too. I was going to write some, but needed a chance to work some out.. Didnt realise that regex had that huge hole.. :)

#21 @Denis-de-Bernardy
8 years ago

the current is_serialized() also catches "some" garbage, btw. specifically, each of the following should ideally get rejected:

b:E;
i:E;
d:E;
a:1::;

however, these seem to be very acceptable edge cases that we can live with: making the regexp more precise in order to validate the serialized data seems overkill given the number of times we call the function -- the application's efficiency would suffer.

#22 @Denis-de-Bernardy
8 years ago

  • Keywords has-patch added; needs-patch removed

#23 @Denis-de-Bernardy
8 years ago

@westi: 9930.diff fixes the original bug. the optimization for strings, arrays and objects is completely optional.

#24 @hakre
8 years ago

Better properly check serialized strings, we will see more exploits on serialized data in the future.

@hakre
8 years ago

#25 @hakre
8 years ago

  • Keywords tested added

I put this on my list again. Uploaded patch provides a fix for the problem. Tested.

This is how I tested:

Code:

$value      = 1.2E+150; 
$serialized = serialize($value);
var_dump(is_serialized($serialized), $serialized, $value);

Output (patched):

boolean true

string 'd:1.200000000000000013344651621705194036153934411236609269391465806550823148718924258603522328009361549E+150;' (length=109)

float 1.2E+150

Additionally some comment and code improvements.

#26 @hakre
8 years ago

Could run my patch (9930.patch) against the official testsuite now:

Before:

2) TestFunctions::test_is_serialized
Serialized data: d:2.099999999999999970432388186544853801030102097813541764707036304487326901911060431043950937593374825E+200;
Failed asserting that <boolean:false> is true.

After:
ZERRO failures in TestFunctions::test_is_serialized.

#27 @Denis-de-Bernardy
8 years ago

9930.patch has an over-optimization:

/([abdiOs]:|N;)/

should remain as:

/([abdiOs]):/

else badions contains garbage.

#28 @Denis-de-Bernardy
8 years ago

  • Keywords needs-patch added; has-patch tested removed

#29 @Denis-de-Bernardy
8 years ago

  • Keywords has-patch added; needs-patch removed

woops, nevermind my last comment. I was getting mixed up the array/string notation using [].

#30 @Denis-de-Bernardy
8 years ago

  • Keywords 2nd-opinion added

works, but do we really want to to introduce this amount of backtracking?

array could just as well use:

preg_match( '/^a:[0-9]+:/', $data ); 

object could just as well use:

preg_match( '/^O:[0-9]+:"[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*":[0-9]+:', $data );

and the string case could use something like:

substr($data, -2, 2) == '";' && preg_match( '/^s:[0-9]+:"/', $data );

#31 @nacin
8 years ago

  • Keywords needs-patch early needs-unit-tests added; has-patch 2nd-opinion removed
  • Milestone changed from 3.0 to 3.1

#32 @hakre
8 years ago

Related: #14429

@hakre
8 years ago

Refreshed against trunk

#33 @hakre
8 years ago

  • Keywords has-patch added; needs-patch needs-unit-tests removed

#34 @nacin
8 years ago

  • Keywords needs-refresh added; has-patch early removed
  • Milestone changed from Awaiting Triage to Future Release

#35 @hakre
8 years ago

If someone is looking for a regex that is validating a whole float/double value, this one is pretty strict so far:

(d:(?:[-]?INF|[+-]?(?:(?:[0-9]+|(?:[0-9]*[\.][0-9]+)|(?:[0-9]+[\.][0-9]*))|(?:[0-9]+|(?:([0-9]*[\.][0-9]+)|(?:[0-9]+[\.][0-9]*)))[eE][+-]?[0-9]+));)

It takes care not only of the scientific notations but as well for INF and -INF. If you need to get a float to be serialized for a tests with INF, this did work for me without caring about a systems float size:

   $float = INF; // or -INF
   $serialized = serialize($float);

For this ticket's traction I think it now depends on whether or not this should be fixed as it might imply performance issues. But the actual performance should be tested before making assumptions as my impression is, that PHP is pretty fast with regular expression when it calls for the same multiple times per request. This is the case for the is_serialized function.

#36 @hakre
7 years ago

  • Keywords has-patch added; needs-refresh removed

Part of this issue has been taken care of in #14429 which already changed the is_serialized function to some good.

the INF/-INF part looks like still missing. I'll add a patch against trunk.

@hakre
7 years ago

Refreshed against trunk, added + and INF

#37 @hakre
7 years ago

Refreshed against trunk HEAD, added +, I, N, F and e which are valid within serialized float values in PHP.

Related: #17375

#38 @chriscct7
4 years ago

  • Keywords needs-refresh dev-feedback added; has-patch removed

There's a minor typo in the proposed patch. "wether" should be whether. Slight refresh to fix that, then dev-evaluation for merge potential.

Last edited 4 years ago by chriscct7 (previous) (diff)

#39 @chriscct7
3 years ago

  • Keywords needs-unit-tests added
Note: See TracTickets for help on using tickets.