#9930 closed defect (bug) (duplicate)
is_serialized() returns false on serialized doubles
Reported by: | vladimir_kolesnikov | Owned by: | westi |
---|---|---|---|
Milestone: | Priority: | normal | |
Severity: | minor | Version: | |
Component: | General | Keywords: | needs-unit-tests |
Focuses: | Cc: |
Description
Test case:
<?php require_once('wp-load.php'); print var_export(is_serialized(serialize(1.2E+150))); ?>
Expected: true
Got: false
serialize(1.2E+150) returns something like 'd:1.200000000000000013344651621705194036153934411236609269391465806550823148718924258603522328009361549E+150;', the plus sign after 'E' is not taken into account by the regexp
Attachments (9)
Change History (50)
#3
in reply to:
↑ 2
@
15 years ago
Replying to Denis-de-Bernardy:
you sure you changed the correct function? it doesn't seem to answer true only only on strings now.
Not sure I understood your question (English is not my native language) but:
- is_serialize() returns false if $data is not a string;
- the regexp is three in one (N;, [aOs]:[0-9]+:.*[;}], [bid]:[0-9.E+-]+;), the third regexp contains '+', since serialized doubles can contain '+' after 'E'
Test case:
<?php require_once('wp-load.php'); function is_serialized2($data) { return is_string($data) && preg_match('/^((N;)|([aOs]:[0-9]+:.*[;}])|([bid]:[0-9.E+-]+;))$/s', $data); } print var_export(is_serialized2(serialize("a\nb"))) . "\n"; print var_export(is_serialized2(serialize(-25))) . "\n"; print var_export(is_serialized2(serialize(25))) . "\n"; print var_export(is_serialized2(serialize(false))) . "\n"; print var_export(is_serialized2(serialize(null))) . "\n"; print var_export(is_serialized2(serialize(array()))) . "\n"; print var_export(is_serialized2(serialize(1.1))) . "\n"; print var_export(is_serialized2(serialize(2.1E+200))) . "\n"; $x = new stdClass(); $x->a = 1; $x->b = 'test'; print var_export(is_serialized2(serialize($x))) . "\n"; ?>
Expected result:
true true true true true true true true true
#4
follow-up:
↓ 5
@
15 years ago
yaya, it's just that, the patch seems to change the is_serialized_string() function, rather than is_serialized() (or maybe_is_serialized(), or whatever it's called).
based on the phpdoc of the function you changed, this is expected:
print var_export(is_serialized_string(serialize("a\nb"))) . "\n"; // true print var_export(is_serialized_string(serialize(-25))) . "\n"; // false print var_export(is_serialized_string(serialize(25))) . "\n"; // false print var_export(is_serialized_string(serialize(false))) . "\n"; // false print var_export(is_serialized_string(serialize(null))) . "\n"; // false print var_export(is_serialized_string(serialize(array()))) . "\n"; // false print var_export(is_serialized_string(serialize(1.1))) . "\n"; // false print var_export(is_serialized_string(serialize(2.1E+200))) . "\n"; // false
#5
in reply to:
↑ 4
@
15 years ago
Replying to Denis-de-Bernardy:
yaya, it's just that, the patch seems to change the is_serialized_string() function, rather than is_serialized() (or maybe_is_serialized(), or whatever it's called).
Yes, you are right, I need to be more attentive :-(
Of course the patch was meant for is_serialized() function, resubmitting.
#7
@
15 years ago
- Milestone changed from 2.8 to 2.9
Patch changes the original regex on bad ions (the second set no longer contains \$ and the first set uses $ instead).
Punting per discussion during the WP meet-up...
#9
@
15 years ago
- Keywords needs-unit-tests added
- Milestone changed from 2.9 to 3.0
Move to 3.0 for now.
#10
follow-up:
↓ 12
@
15 years ago
change that to:
return is_string($data) && preg_match('/^(N;)|([aOs]:[0-9]+:.*[;}])|([bid]:[0-9.E+-]+;)$/s', $data);
ie. Add the s modifier and it passes this test case for me.
function test_is_serialized() { $cases = array( serialize("a\nb"), serialize(-25), serialize(25), serialize(false), serialize(null), serialize(array()), serialize(1.1), serialize(2.1E+200), serialize( (object)array('test' => true, '3', 4) ) ); foreach ( $cases as $case ) $this->assertTrue( is_serialized($case) ); }
#12
in reply to:
↑ 10
@
15 years ago
- Keywords has-patch added; needs-patch needs-unit-tests removed
Replying to dd32:
change that to:
return is_string($data) && preg_match('/^(N;)|([aOs]:[0-9]+:.*[;}])|([bid]:[0-9.E+-]+;)$/s', $data);ie. Add the s modifier and it passes this test case for me.
Looks good to me to.
#15
@
15 years ago
re-opening with a new patch. the new regex makes it much slower, and it can run into backtracking problems with large strings.
if we are to keep it, this:
^(N;)|([aOs]:[0-9]+:.*[;}])|([bid]:[0-9.E+-]+;)$
should be:
^(?:N;|[aOs]:[0-9]+:.*[;}]|[bid]:[0-9.E+-]+;)$
else is_serialized may return true for garbage.
#17
@
15 years ago
- Severity changed from minor to critical
as of r12485:
var_dump(is_serialized('garbage:a:0:garbage;')); // true
#18
@
15 years ago
- Keywords needs-patch added; has-patch removed
- Severity changed from critical to minor
I've reverted the slow and broken change in [12486] and added the extra test case to wordpress-tests.
Lets get a better faster combined patch worked up and then we can commit.
#19
@
15 years ago
k... 9930.2.diff does the following:
- it extends the second regex so that it catches doubles like 1E+1.
- it reduces backtracking for the second regex
more specifically for the second:
- an anchored regexp is very fast, unless it needs to match the entire string (as it currently does)
- substr() is very fast, and it's trivial to see if a string has one out of two values
so by splitting the test in two, we get the same semantics and we make it faster and less resource hungry.
#20
@
15 years ago
The tests need some non-serialized cases too. I was going to write some, but needed a chance to work some out.. Didnt realise that regex had that huge hole.. :)
#21
@
15 years ago
the current is_serialized() also catches "some" garbage, btw. specifically, each of the following should ideally get rejected:
b:E; i:E; d:E; a:1::;
however, these seem to be very acceptable edge cases that we can live with: making the regexp more precise in order to validate the serialized data seems overkill given the number of times we call the function -- the application's efficiency would suffer.
#23
@
15 years ago
@westi: 9930.diff fixes the original bug. the optimization for strings, arrays and objects is completely optional.
#24
@
15 years ago
Better properly check serialized strings, we will see more exploits on serialized data in the future.
#25
@
15 years ago
- Keywords tested added
I put this on my list again. Uploaded patch provides a fix for the problem. Tested.
This is how I tested:
Code:
$value = 1.2E+150; $serialized = serialize($value); var_dump(is_serialized($serialized), $serialized, $value);
Output (patched):
boolean true string 'd:1.200000000000000013344651621705194036153934411236609269391465806550823148718924258603522328009361549E+150;' (length=109) float 1.2E+150
Additionally some comment and code improvements.
#26
@
15 years ago
Could run my patch (9930.patch) against the official testsuite now:
Before:
2) TestFunctions::test_is_serialized Serialized data: d:2.099999999999999970432388186544853801030102097813541764707036304487326901911060431043950937593374825E+200; Failed asserting that <boolean:false> is true.
After:
ZERRO failures in TestFunctions::test_is_serialized
.
#27
@
15 years ago
9930.patch has an over-optimization:
/([abdiOs]:|N;)/
should remain as:
/([abdiOs]):/
else badions contains garbage.
#29
@
15 years ago
- Keywords has-patch added; needs-patch removed
woops, nevermind my last comment. I was getting mixed up the array/string notation using [].
#30
@
15 years ago
- Keywords 2nd-opinion added
works, but do we really want to to introduce this amount of backtracking?
array could just as well use:
preg_match( '/^a:[0-9]+:/', $data );
object could just as well use:
preg_match( '/^O:[0-9]+:"[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*":[0-9]+:', $data );
and the string case could use something like:
substr($data, -2, 2) == '";' && preg_match( '/^s:[0-9]+:"/', $data );
#31
@
14 years ago
- Keywords needs-patch early needs-unit-tests added; has-patch 2nd-opinion removed
- Milestone changed from 3.0 to 3.1
#34
@
14 years ago
- Keywords needs-refresh added; has-patch early removed
- Milestone changed from Awaiting Triage to Future Release
#35
@
14 years ago
If someone is looking for a regex that is validating a whole float/double value, this one is pretty strict so far:
(d:(?:[-]?INF|[+-]?(?:(?:[0-9]+|(?:[0-9]*[\.][0-9]+)|(?:[0-9]+[\.][0-9]*))|(?:[0-9]+|(?:([0-9]*[\.][0-9]+)|(?:[0-9]+[\.][0-9]*)))[eE][+-]?[0-9]+));)
It takes care not only of the scientific notations but as well for INF and -INF. If you need to get a float to be serialized for a tests with INF, this did work for me without caring about a systems float size:
$float = INF; // or -INF $serialized = serialize($float);
For this ticket's traction I think it now depends on whether or not this should be fixed as it might imply performance issues. But the actual performance should be tested before making assumptions as my impression is, that PHP is pretty fast with regular expression when it calls for the same multiple times per request. This is the case for the is_serialized function.
#36
@
13 years ago
- Keywords has-patch added; needs-refresh removed
Part of this issue has been taken care of in #14429 which already changed the is_serialized function to some good.
the INF/-INF part looks like still missing. I'll add a patch against trunk.
#37
@
13 years ago
Refreshed against trunk HEAD, added +, I, N, F and e which are valid within serialized float values in PHP.
Related: #17375
#38
@
10 years ago
- Keywords needs-refresh dev-feedback added; has-patch removed
There's a minor typo in the proposed patch. "wether" should be whether. Slight refresh to fix that, then dev-evaluation for merge potential.
you sure you changed the correct function? it doesn't seem to answer true only only on strings now.