WordPress.org

Make WordPress Core

Opened 5 years ago

Closed 5 years ago

Last modified 4 years ago

#9663 closed defect (bug) (wontfix)

is_serialized() should be renamed to might_be_serialized();

Reported by: hakre Owned by:
Milestone: 2.9 Priority: normal
Severity: normal Version: 2.7.1
Component: Charset Keywords: has-patch
Focuses: Cc:

Description

is_serialized() peeks into a string and 'tries' to find out if it is serialied (= encoded) data.

infact it looks for data being a string and then checks if it might be serialized data. for example, the string 'N;' is interpreted as serialized data. well, this can actually be just a string.

next to this the function is not aware of handling utf-8. it does not check wether or not the passed string is utf-8 encoded. then it uses regular expressions that are not utf-8 aware (and which can not, because WP is php 4.3 compatbile with has no such functionality with pcre).

Attachments (1)

9663.patch (1.9 KB) - added by hakre 4 years ago.

Download all attachments as: .zip

Change History (12)

comment:1 Denis-de-Bernardy5 years ago

  • Component changed from General to Charset

comment:2 follow-up: vladimir_kolesnikov5 years ago

  • Cc vladimir@… added

Re: UTF-8: Disagreed. Regexps used in is_serialized() are UTF-8 safe:

/[bid]:[0-9.E-]+;$/ does not contain any characters that could be interpreted as multibyte in UTF-8

/[aOs]:[0-9]+:.*[;}]$/s - its head (up to .*) is multibyte-safe; tailing ';' or '}' are guaranteed not to be a part of a multibyte character since only one character symbols are in the range of U+0000...U+007F

Re: N; - actually very interesting - should a plugin store "N;" in wp_options, it would get NULL in get_option(). But I am not sure if it is possible to fix this without breaking backward compatibility.

comment:3 in reply to: ↑ 2 ; follow-up: Denis-de-Bernardy5 years ago

Replying to vladimir_kolesnikov:

Re: UTF-8: Disagreed. Regexps used in is_serialized() are UTF-8 safe:

serialize()/unserialize() isn't, however.

comment:4 in reply to: ↑ 3 ; follow-up: vladimir_kolesnikov5 years ago

Replying to Denis-de-Bernardy:

serialize()/unserialize() isn't, however.

Why?

  • integers/floats/booleans are multibyte safe by definition.
  • strings are prefixed with length: s:3:"hi!", so it does not matter what is inside the string since we know its size in bytes (+2 for surrounding quotes).
  • objects have their names prefixed with length (say, O:8:"stdClass":...); as a consequence, they are also multibyte safe;
  • arrays/object properties consist of integers/floats/booleans/strings (which are mb-safe); thus, arrays and objects are mb-safe as well.

comment:5 in reply to: ↑ 4 Denis-de-Bernardy5 years ago

Replying to vladimir_kolesnikov:

  • strings are prefixed with length: s:3:"hi!", so it does not matter what is inside the string since we know its size in bytes (+2 for surrounding quotes).

With some characters, in some versions of php, the serialized string length that gets returned is erroneous. I've seen this break quite a few customer sites.

comment:6 Denis-de-Bernardy5 years ago

  • Milestone changed from Future Release to 2.9

comment:7 hakre5 years ago

  • Resolution set to wontfix
  • Status changed from new to closed

The main problem is the fuzzyness of the function: It does not do always the same. So you can get unexpected results.

For the UTF-8 related problems please keep in mind that it was a hint to the fact, that these functions aren't actually aware of the encoding. But that's more or less a sidenote.

Anyway, there will be no function renaming I hardly assume and therefore I will close this ticket now as wontfix.

comment:8 Denis-de-Bernardy4 years ago

this might get re-opened one day, as there is room for fun over in ms-edit.php:

$wpdb->query( "INSERT INTO " . $wpdb->usermeta . "( `umeta_id` , `user_id` , `meta_key` , `meta_value` ) VALUES ( NULL, '$userid', '" . $blog_prefix . "capabilities', 'a:1:{s:" . strlen( $role ) . ":\"" . $role . "\";b:1;}')" );

comment:9 Denis-de-Bernardy4 years ago

see #11781 for reference

comment:10 follow-up: sirzooro4 years ago

PCRE functions in PHP 4.3 are utf-8-aware - excerpt from PHP documentation for Pattern Modifiers:

u (PCRE8)
This modifier turns on additional functionality of PCRE that is incompatible with Perl. Pattern strings are treated as UTF-8. This modifier is available from PHP 4.1.0 or greater on Unix and from PHP 4.2.3 on win32. UTF-8 validity of the pattern is checked since PHP 4.3.5.

So in 4.3 we do not have checks for utf-8 pattern validity, but functionality is present.

comment:11 in reply to: ↑ 10 hakre4 years ago

  • Keywords has-patch added

Replying to sirzooro:

PCRE functions in PHP 4.3 are utf-8-aware - excerpt from PHP documentation for Pattern Modifiers:

u (PCRE8)
This modifier turns on additional functionality of PCRE that is incompatible with Perl. Pattern strings are treated as UTF-8. This modifier is available from PHP 4.1.0 or greater on Unix and from PHP 4.2.3 on win32. UTF-8 validity of the pattern is checked since PHP 4.3.5.

So in 4.3 we do not have checks for utf-8 pattern validity, but functionality is present.

That's right, but actually I did not see a problem with the regex pattern here but with the design of the function / concept in which this function is used in. Technically spoken there is no other way to ensure that a string is a serialized value then de-serializing it and in case it returns false, to re-check the given string if it contains the serialized value of false "b:0;" or not.

Slight improvements to the function can be made to differ between array/object and string.

I took the oportunity to review the function again:

  • The function does a trim() on the input data. If the input data is serialized, there would be no need to trim the data for a propper test. I should be removed.
  • Serialized data can somehow be large. Passing it per reference might save a bit.
  • The identification of the main type being serialized was a bit akward. Streamlined it.
  • Array, Object and String could be better tested.

To improve the overall concept, a function can be created that not only returns true or false but the main serialized vartype as string. Most often serialization is used to store arrays or object into the database. With such a function, it could be differed between skalar types on the one and object/arrays on the other side. So if a value from database is retrieved that could be a serialized string then this would be a hint, that the value acutally is not serialized.

hakre4 years ago

Note: See TracTickets for help on using tickets.