Make WordPress Core

Opened 15 years ago

Closed 5 years ago

Last modified 5 years ago

#9930 closed defect (bug) (duplicate)

is_serialized() returns false on serialized doubles

Reported by: vladimir_kolesnikov's profile vladimir_kolesnikov Owned by: westi's profile westi
Milestone: Priority: normal
Severity: minor Version:
Component: General Keywords: needs-unit-tests
Focuses: Cc:

Description

Test case:

<?php
    require_once('wp-load.php');
    print var_export(is_serialized(serialize(1.2E+150)));
?>

Expected: true
Got: false

serialize(1.2E+150) returns something like 'd:1.200000000000000013344651621705194036153934411236609269391465806550823148718924258603522328009361549E+150;', the plus sign after 'E' is not taken into account by the regexp

Attachments (9)

functions.php.diff (900 bytes) - added by vladimir_kolesnikov 15 years ago.
9930.diff (417 bytes) - added by Denis-de-Bernardy 15 years ago.
optimize-is-serialized.diff (473 bytes) - added by Denis-de-Bernardy 15 years ago.
untested
9930.2.diff (665 bytes) - added by Denis-de-Bernardy 15 years ago.
9930.patch (1.9 KB) - added by hakre 15 years ago.
9930.2.patch (2.3 KB) - added by hakre 14 years ago.
Refreshed against trunk
9930.3.patch (469 bytes) - added by hakre 13 years ago.
Refreshed against trunk, added + and INF
9930-unittests.diff (1.1 KB) - added by MikeHansenMe 10 years ago.
see #30284
9930-unitttests.diff (731 bytes) - added by boonebgorges 10 years ago.
See #30284.

Download all attachments as: .zip

Change History (50)

#1 @vladimir_kolesnikov
15 years ago

  • Cc vladimir@… added
  • Keywords has-patch added

#2 follow-up: @Denis-de-Bernardy
15 years ago

  • Milestone set to 2.8

you sure you changed the correct function? it doesn't seem to answer true only only on strings now.

#3 in reply to: ↑ 2 @vladimir_kolesnikov
15 years ago

Replying to Denis-de-Bernardy:

you sure you changed the correct function? it doesn't seem to answer true only only on strings now.

Not sure I understood your question (English is not my native language) but:

  1. is_serialize() returns false if $data is not a string;
  2. the regexp is three in one (N;, [aOs]:[0-9]+:.*[;}], [bid]:[0-9.E+-]+;), the third regexp contains '+', since serialized doubles can contain '+' after 'E'

Test case:

<?php
    require_once('wp-load.php');

    function is_serialized2($data)
    {
        return is_string($data) && preg_match('/^((N;)|([aOs]:[0-9]+:.*[;}])|([bid]:[0-9.E+-]+;))$/s', $data);
    }

    print var_export(is_serialized2(serialize("a\nb"))) . "\n";
    print var_export(is_serialized2(serialize(-25))) . "\n";
    print var_export(is_serialized2(serialize(25))) . "\n";
    print var_export(is_serialized2(serialize(false))) . "\n";
    print var_export(is_serialized2(serialize(null))) . "\n";
    print var_export(is_serialized2(serialize(array()))) . "\n";
    print var_export(is_serialized2(serialize(1.1))) . "\n";
    print var_export(is_serialized2(serialize(2.1E+200))) . "\n";

    $x = new stdClass();
    $x->a = 1;
    $x->b = 'test';
    print var_export(is_serialized2(serialize($x))) . "\n";
?>

Expected result:

true
true
true
true
true
true
true
true
true

#4 follow-up: @Denis-de-Bernardy
15 years ago

yaya, it's just that, the patch seems to change the is_serialized_string() function, rather than is_serialized() (or maybe_is_serialized(), or whatever it's called).

based on the phpdoc of the function you changed, this is expected:

print var_export(is_serialized_string(serialize("a\nb"))) . "\n"; // true
print var_export(is_serialized_string(serialize(-25))) . "\n"; // false
print var_export(is_serialized_string(serialize(25))) . "\n"; // false
print var_export(is_serialized_string(serialize(false))) . "\n"; // false
print var_export(is_serialized_string(serialize(null))) . "\n"; // false
print var_export(is_serialized_string(serialize(array()))) . "\n"; // false
print var_export(is_serialized_string(serialize(1.1))) . "\n"; // false
print var_export(is_serialized_string(serialize(2.1E+200))) . "\n"; // false

#5 in reply to: ↑ 4 @vladimir_kolesnikov
15 years ago

Replying to Denis-de-Bernardy:

yaya, it's just that, the patch seems to change the is_serialized_string() function, rather than is_serialized() (or maybe_is_serialized(), or whatever it's called).

Yes, you are right, I need to be more attentive :-(

Of course the patch was meant for is_serialized() function, resubmitting.

#6 @westi
15 years ago

  • Owner set to westi
  • Status changed from new to reviewing

#7 @Denis-de-Bernardy
15 years ago

  • Milestone changed from 2.8 to 2.9

Patch changes the original regex on bad ions (the second set no longer contains \$ and the first set uses $ instead).

Punting per discussion during the WP meet-up...

#8 @Denis-de-Bernardy
15 years ago

  • Keywords needs-patch added; has-patch removed

#9 @westi
15 years ago

  • Keywords needs-unit-tests added
  • Milestone changed from 2.9 to 3.0

Move to 3.0 for now.

#10 follow-up: @dd32
15 years ago

change that to:

return is_string($data) && preg_match('/^(N;)|([aOs]:[0-9]+:.*[;}])|([bid]:[0-9.E+-]+;)$/s', $data);

ie. Add the s modifier and it passes this test case for me.

	function test_is_serialized() {
		$cases = array(
					   serialize("a\nb"),
					   serialize(-25),
					   serialize(25),
					   serialize(false),
					   serialize(null),
					   serialize(array()),
					   serialize(1.1),
					   serialize(2.1E+200),
					   serialize( (object)array('test' => true, '3', 4) )
					   );

		foreach ( $cases as $case )
			$this->assertTrue( is_serialized($case) );

	}

#11 @westi
15 years ago

Thanks!

I've added this test cases to wordpress-tests.

#12 in reply to: ↑ 10 @westi
15 years ago

  • Keywords has-patch added; needs-patch needs-unit-tests removed

Replying to dd32:

change that to:

return is_string($data) && preg_match('/^(N;)|([aOs]:[0-9]+:.*[;}])|([bid]:[0-9.E+-]+;)$/s', $data);

ie. Add the s modifier and it passes this test case for me.

Looks good to me to.

#13 @westi
15 years ago

  • Resolution set to fixed
  • Status changed from reviewing to closed

(In [12485]) Much improved is_serialized() which catches serialized doubles. Fixes #9930 props vladimir_kolesnikov and dd32.

#14 @Denis-de-Bernardy
15 years ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

#15 @Denis-de-Bernardy
15 years ago

re-opening with a new patch. the new regex makes it much slower, and it can run into backtracking problems with large strings.

if we are to keep it, this:

^(N;)|([aOs]:[0-9]+:.*[;}])|([bid]:[0-9.E+-]+;)$

should be:

^(?:N;|[aOs]:[0-9]+:.*[;}]|[bid]:[0-9.E+-]+;)$

else is_serialized may return true for garbage.

#16 @Denis-de-Bernardy
15 years ago

I've added a separate patch to optimize the other regexp in the function.

#17 @Denis-de-Bernardy
15 years ago

  • Severity changed from minor to critical

as of r12485:

var_dump(is_serialized('garbage:a:0:garbage;')); // true

#18 @westi
15 years ago

  • Keywords needs-patch added; has-patch removed
  • Severity changed from critical to minor

I've reverted the slow and broken change in [12486] and added the extra test case to wordpress-tests.

Lets get a better faster combined patch worked up and then we can commit.

#19 @Denis-de-Bernardy
15 years ago

k... 9930.2.diff does the following:

  • it extends the second regex so that it catches doubles like 1E+1.
  • it reduces backtracking for the second regex

more specifically for the second:

  • an anchored regexp is very fast, unless it needs to match the entire string (as it currently does)
  • substr() is very fast, and it's trivial to see if a string has one out of two values

so by splitting the test in two, we get the same semantics and we make it faster and less resource hungry.

#20 @dd32
15 years ago

The tests need some non-serialized cases too. I was going to write some, but needed a chance to work some out.. Didnt realise that regex had that huge hole.. :)

#21 @Denis-de-Bernardy
15 years ago

the current is_serialized() also catches "some" garbage, btw. specifically, each of the following should ideally get rejected:

b:E;
i:E;
d:E;
a:1::;

however, these seem to be very acceptable edge cases that we can live with: making the regexp more precise in order to validate the serialized data seems overkill given the number of times we call the function -- the application's efficiency would suffer.

#22 @Denis-de-Bernardy
15 years ago

  • Keywords has-patch added; needs-patch removed

#23 @Denis-de-Bernardy
15 years ago

@westi: 9930.diff fixes the original bug. the optimization for strings, arrays and objects is completely optional.

#24 @hakre
15 years ago

Better properly check serialized strings, we will see more exploits on serialized data in the future.

@hakre
15 years ago

#25 @hakre
15 years ago

  • Keywords tested added

I put this on my list again. Uploaded patch provides a fix for the problem. Tested.

This is how I tested:

Code:

$value      = 1.2E+150; 
$serialized = serialize($value);
var_dump(is_serialized($serialized), $serialized, $value);

Output (patched):

boolean true

string 'd:1.200000000000000013344651621705194036153934411236609269391465806550823148718924258603522328009361549E+150;' (length=109)

float 1.2E+150

Additionally some comment and code improvements.

#26 @hakre
15 years ago

Could run my patch (9930.patch) against the official testsuite now:

Before:

2) TestFunctions::test_is_serialized
Serialized data: d:2.099999999999999970432388186544853801030102097813541764707036304487326901911060431043950937593374825E+200;
Failed asserting that <boolean:false> is true.

After:
ZERRO failures in TestFunctions::test_is_serialized.

#27 @Denis-de-Bernardy
15 years ago

9930.patch has an over-optimization:

/([abdiOs]:|N;)/

should remain as:

/([abdiOs]):/

else badions contains garbage.

#28 @Denis-de-Bernardy
15 years ago

  • Keywords needs-patch added; has-patch tested removed

#29 @Denis-de-Bernardy
15 years ago

  • Keywords has-patch added; needs-patch removed

woops, nevermind my last comment. I was getting mixed up the array/string notation using [].

#30 @Denis-de-Bernardy
15 years ago

  • Keywords 2nd-opinion added

works, but do we really want to to introduce this amount of backtracking?

array could just as well use:

preg_match( '/^a:[0-9]+:/', $data ); 

object could just as well use:

preg_match( '/^O:[0-9]+:"[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*":[0-9]+:', $data );

and the string case could use something like:

substr($data, -2, 2) == '";' && preg_match( '/^s:[0-9]+:"/', $data );

#31 @nacin
14 years ago

  • Keywords needs-patch early needs-unit-tests added; has-patch 2nd-opinion removed
  • Milestone changed from 3.0 to 3.1

#32 @hakre
14 years ago

Related: #14429

@hakre
14 years ago

Refreshed against trunk

#33 @hakre
14 years ago

  • Keywords has-patch added; needs-patch needs-unit-tests removed

#34 @nacin
14 years ago

  • Keywords needs-refresh added; has-patch early removed
  • Milestone changed from Awaiting Triage to Future Release

#35 @hakre
14 years ago

If someone is looking for a regex that is validating a whole float/double value, this one is pretty strict so far:

(d:(?:[-]?INF|[+-]?(?:(?:[0-9]+|(?:[0-9]*[\.][0-9]+)|(?:[0-9]+[\.][0-9]*))|(?:[0-9]+|(?:([0-9]*[\.][0-9]+)|(?:[0-9]+[\.][0-9]*)))[eE][+-]?[0-9]+));)

It takes care not only of the scientific notations but as well for INF and -INF. If you need to get a float to be serialized for a tests with INF, this did work for me without caring about a systems float size:

   $float = INF; // or -INF
   $serialized = serialize($float);

For this ticket's traction I think it now depends on whether or not this should be fixed as it might imply performance issues. But the actual performance should be tested before making assumptions as my impression is, that PHP is pretty fast with regular expression when it calls for the same multiple times per request. This is the case for the is_serialized function.

#36 @hakre
13 years ago

  • Keywords has-patch added; needs-refresh removed

Part of this issue has been taken care of in #14429 which already changed the is_serialized function to some good.

the INF/-INF part looks like still missing. I'll add a patch against trunk.

@hakre
13 years ago

Refreshed against trunk, added + and INF

#37 @hakre
13 years ago

Refreshed against trunk HEAD, added +, I, N, F and e which are valid within serialized float values in PHP.

Related: #17375

#38 @chriscct7
10 years ago

  • Keywords needs-refresh dev-feedback added; has-patch removed

There's a minor typo in the proposed patch. "wether" should be whether. Slight refresh to fix that, then dev-evaluation for merge potential.

Last edited 10 years ago by chriscct7 (previous) (diff)

#39 @chriscct7
9 years ago

  • Keywords needs-unit-tests added

#40 @donmhico
5 years ago

  • Resolution set to duplicate
  • Status changed from reopened to closed

Duplicate of #46570.

This bug is no longer valid. See [45754].

Last edited 5 years ago by SergeyBiryukov (previous) (diff)

#41 @SergeyBiryukov
5 years ago

  • Keywords needs-refresh dev-feedback removed
  • Milestone Future Release deleted
Note: See TracTickets for help on using tickets.