WordPress.org

Make WordPress Core

Opened 6 months ago

Last modified 6 months ago

#53295 new defect (bug)

Serialized data should be handled as an opaque value

Reported by: whitewinterwolf Owned by:
Milestone: Awaiting Review Priority: normal
Severity: normal Version:
Component: General Keywords: has-patch
Focuses: Cc:

Description

The `is_serialized()` function makes strong assumptions on the serialized data format and layout.

  • This seems wrong and unreliable as there is no commitment in PHP documentation over this formatting: serialized data should be handled as an opaque binary string instead.
  • This breaks third-party software customizing serialized data format, for instance Snuffleupagus, a security software which adds an HMAC to serialized data to prevent malicious injections.

The correct way provided by the PHP language to determine whether a string contains a valid serialized data is to simply check the return value of the `unserialize()` function instead of relying on a dozen of various comparisons to implement a self-made heuristic.

Change History (19)

This ticket was mentioned in PR #1312 on WordPress/wordpress-develop by WhiteWinterWolf.


6 months ago

  • Keywords has-patch added

Rely on PHP proper unserliaze() return value to check whether a string contains serialized data instead of implementing a custom heuristic and break third-party software.

Trac ticket: https://core.trac.wordpress.org/ticket/53295

#2 @prbot
6 months ago

WhiteWinterWolf commented on PR #1312:

Currently checking #17375 and understanding when to return true or false for serialized objects, new commit on its way.

#3 @prbot
6 months ago

Ayesh commented on PR #1312:

Hi @WhiteWinterWolf - Thank you for opening this PR.

On a somewhat related ticket, I created Trac-53299 because the current is_serialized function is insufficient for upcoming Enums symbols.

#4 @whitewinterwolf
6 months ago

Hi @nacin,

You added for #17375 a unit test to assert that serializable objects will never pass is_serialized().

I exposed the details for discussion on the forum, but the point is that this fails to prevent objects to pass is_serialized(), but even worse it prevents the use of security products efficiently protecting against unserialize-related vulnerabilities, thus actually weakening WordPress against such attacks instead of hardening it.

So, if by any chance you are nearby, I would be glad to have your opinion whether this unit test can be reverted?

Thank you!

#5 @prbot
6 months ago

WhiteWinterWolf commented on PR #1312:

Hi @Ayesh,

Thanks for the information.

The goal of this PR is precisely to migrate from the current set of heuristics, which cause more harm than good IMHO, toward a cleaner approach relying on standard PHP code. This would avoid the need to update the heuristics based on PHP internal code changes, as in your request, and would provide a more stable behavior as it would be guarantee, no matter the PHP version used, enabled plugins or external tools installed, that the return value of is_serialized() would accurately reflect the fact that a given string can or cannot be unserialized.

FYI, for historical reason dating to Trac-17375 is_serialized() also returns false for some valid serialized objects by design. I've opened a forum thread to discuss if we really want to keep this (undocumented) behavior.

#6 @prbot
6 months ago

Ayesh commented on PR #1312:

Thank you - I agree with you that it hardly makes sense to check serialized data like this.

That said, I doubt calling unserialize ourselves is a safer alternative - We got rid of the `file_exists('phar://')` ugliness exactly with the same reason.

I see that you have an empty allow-list for PHP >=7.3, but unfortunately there is a sizeable WordPress usage for older PHP versions, no matter how much we dislike it.

#7 @prbot
6 months ago

WhiteWinterWolf commented on PR #1312:

I expect that when a caller checks if a string contains serialized data he already intends to unserialize it, so serialized data reaching this point _will_ be unserialized anyway, sooner or later.

For this reason, the allow list is not really here as a security measure, but merely for performance reasons to avoid unnecessary calls and work to wakeup an object we will immediately destruct.

Nevertheless, I can imagine keeping the legacy code for legacy users and relying on unserialize() with an empty list for new ones (maybe those additional calls for legacy users _might_ trigger hidden bugs?). For legacy users, by definition there won't be PHP changes to apply, so it should still solve the maintainability issue, and up-to-date users would still benefit from a more robust and compatible code.

But this leaves us with this odd behavior, enforced by WordPress test, chosen by design and formally "frozen in time", that is_serialize() must return false for objects implementing the Serializable when they are directly serialized, while return true for other kind of serialized objects or when objects implementing the Serializable interface are embedded inside other data structures.

Wouldn't there be a unit test specifically made for this I would have called it a bug, but it is not, and therefore I wonder if we are meant to keep this behavior or if we are allowed to just do as the documentation says: return "False if not serialized and true if it was".

#8 @siliconforks
6 months ago

I see at least two potential security vulnerabilities with PR #1312.

Object injection with PHP 5.6

On PHP 5.6, object injection and code execution could occur inside the new implementation of is_serialized() because it is calling unserialize(). For example, consider a WordPress site running PHP 5.6 that has a plugin containing the Example1 class from https://owasp.org/www-community/vulnerabilities/PHP_Object_Injection ...

<?php
class Example1
{
   public $cache_file;

   function __construct()
   {
      // some PHP code...
   }

   function __destruct()
   {
      $file = "/var/www/cache/tmp/{$this->cache_file}";
      if (file_exists($file)) @unlink($file);
   }
}

An attacker could simply do the following:

  1. Register as an ordinary (subscriber) user on the site.
  2. Log in and go to the "Profile" page.
  3. Enter serialized data in a profile field - e.g., in the first name field enter O:8:"Example1":1:{s:10:"cache_file";s:15:"../../index.php";}.
  4. Click "Update Profile" and WordPress will call is_serialized() on the malicious user input. An instance of the Example1 class will be created and its destructor will be called. This will delete the file /var/www/index.php.

Object injection from data stored in the database by older WordPress versions

The PHP 5.6 vulnerability above might not seem like too much of a problem because WordPress will likely be dropping support for PHP 5.6 soon, and this PR could be merged after that happens. However, there is still another, more subtle vulnerability, which is what @nacin described in #17375 - it is not possible to safely change the behavior of is_serialized().

Suppose this PR were to be accepted into a future WordPress version, say, WordPress 5.8. Then a malicious user could do the following:

  1. Find a WordPress site running a version of WordPress earlier than version 5.8.
  2. Register as a user on that site.
  3. Log in and go to the "Profile" page.
  4. Enter a serialized C:... value the first name field.
  5. Click "Update Profile" and this C:... value will be stored in the database.

Note: at this point you may be thinking, "Wait, what does this have to do with the PR in question? If this attack requires an older version of WordPress without the PR, then it must be a vulnerability in that version of WordPress. It has nothing to do with this PR." But there isn't actually any vulnerability at this point - the C:... value will never be unserialized by WordPress. If WordPress runs is_serialized() on this value, the function will return false, so WordPress will never actually consider it to be serialized data at all, and it will never try to unserialize it. WordPress just thinks that the attacker is a user whose first name happens to be C:... and it will simply display that string on the user's profile page.

The vulnerability will arise in the future if the site ever upgrades to WordPress 5.8 (with the new is_serialized() implementation). Then, if WordPress ever attempts to retrieve the attacker's first name from the database, the is_serialized() function will return true for this value, so WordPress will then attempt to unserialize it, potentially resulting in code execution.

I suspect that a fix for #53299 might result in the same vulnerability.

#9 follow-up: @whitewinterwolf
6 months ago

Hi @siliconforks,

The thing is that the current code is already "vulnerable" to the issues you raise :

  • Object injection with PHP 5.6: is_serialized() already returns true so the caller will unserialize it and execute the malicious payload anyway.
  • Object injection from data stored in the database by older WordPress versions: checking the first character of the serialized string doesn't do anything against trivial bypasses such as objects stored in an array, so is completely inefficient to protect against such threat.

Nevertheless, I can understand that:

  • One may prefer to stick to the historical behavior as much as possible, no matter the reasoning behind it.
  • One may prefer to leave it to the caller to trigger the payload, even it doesn't change anything to consequences.

I've therefore pushed a new commit:

  • Restoring the legacy code for legacy PHP.
  • Restoring the original unit test.
  • Updating the function description to more closely match its behavior.
  • Enabling standard code for PHP >= 7.0.0, allowing the use of tools such as Snuffleupagus to efficiently protect against the threats you describe.

#10 in reply to: ↑ 9 @siliconforks
6 months ago

Replying to whitewinterwolf:

Hi @siliconforks,

The thing is that the current code is already "vulnerable" to the issues you raise :

  • Object injection with PHP 5.6: is_serialized() already returns true so the caller will unserialize it and execute the malicious payload anyway.

The current WordPress code is not vulnerable to the attack I described because it handles serialized data specially; specifically, the maybe_serialize function tests for the presence of serialized data and serializes it again (so the data is stored in the database double-serialized). This provides protection against malicious serialized data (specifically, O:... objects) entered by a user: when the data is retrieved from the database, it will be unserialized, but that only works on the layer of serialization added by WordPress; the serialized data that was provided by the user is not unserialized.

So WordPress is careful to never call unserialize on user input.

(Note: if you do think there's a case where WordPress calls unserialize on user input, that's likely a security vulnerability and it should probably be reported privately via HackerOne. But I don't think there is any case where that happens in the current WordPress code.)

#11 follow-up: @whitewinterwolf
6 months ago

Hi @siliconforks,

I'm quite familiar with your vulnerability reporting process, as I'm notably the author of the CVE-2021-29504 published two weeks ago for WP-Cli.

I understand there is a debate regarding the use of the unserialize() function with older PHP versions to check whether some data can be unserialized. That's why, as stated above, I restored the original legacy code for these older PHP versions as this change is not strictly necessary for this ticket.

Newer PHP versions (>= 7.0.0) added the $options parameter allowing to safely call unserialize() without running any object related code. This is the safest and cleanest way to do it, and makes WordPress compatible with third-party tools (current WordPress code breaks third-party security software, thus endangering WordPress installations). Note that I also added a specific check for serialized data beginning with a 'C' to remain compatible with the original behavior.

While I also updated the function description to describe this original behavior, the description update is also not strictly necessary for this ticket so we can choose to keep this undocumented behavior undocumented if you feel its better that way.

#12 in reply to: ↑ 11 ; follow-up: @siliconforks
6 months ago

Replying to whitewinterwolf:

This is the safest and cleanest way to do it, and makes WordPress compatible with third-party tools (current WordPress code breaks third-party security software, thus endangering WordPress installations).

Ultimately, the problem is that supporting such third-party tools requires changing the serialization format recognized by WordPress, and I don't think it is possible to do that without introducing a new vulnerability. This is why "`is_serialized` is frozen in time".

#13 in reply to: ↑ 12 ; follow-up: @whitewinterwolf
6 months ago

Replying to siliconforks:

Ultimately, the problem is that supporting such third-party tools requires changing the serialization format recognized by WordPress, and I don't think it is possible to do that without introducing a new vulnerability. This is why "`is_serialized` is frozen in time".

I'm not sure to exactly understand what you mean:

A) The allowed_classes filter has been added to the unserialize() function by the PHP team with the sole goal to prevent any possibility of malicious code execution triggered by the deserialization process (see secure unserialize). Do you mean that they failed their attempt and that calling unserialized($data, ['allowed_classes' => false]) is still prone to unsecure deserialization vulnerabilities? Do you have any reference or example?

B) Do you mean that the legacy and new code returns different values for the same serialized object data provided as input? If so, do you have any example?

C) Do you mean that this function has been stated as "frozen in time", so there is by principle no point discussing any change to its content anyway?

#14 in reply to: ↑ 13 @siliconforks
6 months ago

Replying to whitewinterwolf:

A) The allowed_classes filter has been added to the unserialize() function by the PHP team with the sole goal to prevent any possibility of malicious code execution triggered by the deserialization process (see secure unserialize). Do you mean that they failed their attempt and that calling unserialized($data, ['allowed_classes' => false]) is still prone to unsecure deserialization vulnerabilities? Do you have any reference or example?

My concern is not just with vulnerabilities inside the implementation of is_serialized() itself, but also in the grander scheme of things - what are the consequences if is_serialized() gives the wrong answer? If it returns the wrong result, can this lead to a case where WordPress calls unserialize() on untrusted user input?

B) Do you mean that the legacy and new code returns different values for the same serialized object data provided as input? If so, do you have any example?

Well, in the current version of the PR, the PHP 7 branch seems buggy - in particular I believe the 'C' test was intended to test for equality instead of inequality - but even if you fix issues like this, I don't think there's really any way to make it work without introducing a vulnerability.

For example, consider the string O:8:"Example1":1:{s:10:"cache_file";s:15:"../../index.php";}ab84a81d5a516892c86d3a620ac3e714286b8dcdcdebaf67e8112a54ed2514f0. My understanding is that this is what serialized data looks like when Snuffleupagus is enabled. Currently in WordPress, is_serialized() will return false for that. If you want Snuffleupagus to work with that string, you'll want to change is_serialized() to return true - but now the behavior of is_serialized() has changed. This means that, with the current version of WordPress, an attacker could input that string as, say, his first name, and that would not be considered serialized data. But in the future, if WordPress were upgraded to a version with an implementation of is_serialized() returning true for that string, it would then be considered serialized data and would be unserialized, potentially causing code execution.

C) Do you mean that this function has been stated as "frozen in time", so there is by principle no point discussing any change to its content anyway?

I might be missing something, but I can't think of any way you could change the behavior of is_serialized() without breaking something or potentially introducing a vulnerability. (The implementation of is_serialized() could change, but it would still need to return true for the same strings and false for the same strings as it did before.)

#15 follow-up: @whitewinterwolf
6 months ago

Replying to siliconforks:

in particular I believe the 'C' test was intended to test for equality instead of inequality

Thanks, this is now fixed.

For example, consider the string O:8:"Example1":1:{s:10:"cache_file";s:15:"../../index.php";}ab84a81d5a516892c86d3a620ac3e714286b8dcdcdebaf67e8112a54ed2514f0. My understanding is that this is what serialized data looks like when Snuffleupagus is enabled. Currently in WordPress, is_serialized() will return false for that.

Currently, is_serialized() already returns true for such serialized objects :

var_dump(is_serialized('O:8:"Example1":1:{s:10:"cache_file";s:15:"../../index.php";}'));
#Output: bool(true)

So there is strictly no point in expecting this function to protect against any kind of untrusted user input.

This means that, with the current version of WordPress, an attacker could input that string as, say, his first name, and that would not be considered serialized data. But in the future, if WordPress were upgraded to a version with an implementation of is_serialized() returning true for that string, it would then be considered serialized data and would be unserialized, potentially causing code execution.

If a module stores serialized data in the database, then it won't be stored as the raw data but as a serialized string: s:60:"O:8:"Example1":1:{s:10:"cache_file";s:15:"../../index.php";}";.

When the modules fetches the value and unserializes it, it will retrieve the initial string, not the object.

If, at any point, a remote user has the ability to confuse some module and force it to process a user-controlled raw input string as a serialized data, then this means that it is the code of this module which is vulnerable and there is nothing that WordPress could do to prevent this.

A very quick check return me several Insecure Deserialization vulnerabilities affecting Wordpress modules (CVE-2020-36326, CVE-2020-28032, CVE-2019-12240, CVE-2018-20148, etc.). Would this protection be of any use, WordPress should be immune to this class of vulnerabilities. But as explained above, there is strictly no point in expecting this function to protect against any kind of untrusted user input.

I understand nevertheless that one may prefer to keep legacy code for the legacy PHP versions and keep any historical behavior as much as possible to limit the risk of regression to the minimum, and I already updated my PR that way.

Nevertheless, since is_serialized() already returns true for serialized objects and malicious strings, I do not see any behavior change: it returned true before the PR, it returns true after the PR, and do perceive one "true" to be safer than the other ;) , the only difference being that before the PR it is not possible to enable any actual protection against this threat class.

Last edited 6 months ago by whitewinterwolf (previous) (diff)

#16 in reply to: ↑ 15 ; follow-up: @siliconforks
6 months ago

Replying to whitewinterwolf:

Replying to siliconforks:

For example, consider the string O:8:"Example1":1:{s:10:"cache_file";s:15:"../../index.php";}ab84a81d5a516892c86d3a620ac3e714286b8dcdcdebaf67e8112a54ed2514f0. My understanding is that this is what serialized data looks like when Snuffleupagus is enabled. Currently in WordPress, is_serialized() will return false for that.

Currently, is_serialized() already returns true for such serialized objects :

var_dump(is_serialized('O:8:"Example1":1:{s:10:"cache_file";s:15:"../../index.php";}'));
#Output: bool(true)

What I mean is that, currently, is_serialized will return false for the string with the HMAC at the end:

var_dump(is_serialized('O:8:"Example1":1:{s:10:"cache_file";s:15:"../../index.php";}ab84a81d5a516892c86d3a620ac3e714286b8dcdcdebaf67e8112a54ed2514f0'));
#Output: bool(false)

Ultimately, in order to support Snuffleupagus, is_serialized would need to return true for that string - but then it would violate the rule that the behavior of is_serialized cannot change.

#17 @prbot
6 months ago

WhiteWinterWolf commented on PR #1312:

Hi @Ayesh,

For you information I've created trac-53390 to request the unfreeze of the is_serialized() function as it misses it security objectives.

#18 in reply to: ↑ 16 @whitewinterwolf
6 months ago

Replying to siliconforks:

Ultimately, in order to support Snuffleupagus, is_serialized would need to return true for that string - but then it would violate the rule that the behavior of is_serialized cannot change.

OK, so your answer to my question above is:

C) this function has been stated as "frozen in time", so there is by principle no point discussing any change to its content anyway.

I've therefore opened a side-issue to address this without polluting this thread.

#19 @prbot
6 months ago

WhiteWinterWolf commented on PR #1312:

If anyone read this, the last failing unit test doesn't make any sense to me:

// Wrong number of characters is close enough for is_serialized_string().
array( 's:12:"foo";', true ),

How and why is it expected for is_serialized() to return true for malformed and invalid data which will fail to deserialize, unserialize('s:12:"foo";') generating the error below:

PHP Notice:  unserialize(): Error at offset 2 of 11 bytes in /tmp/test.php on line 3
Note: See TracTickets for help on using tickets.