Opened 5 months ago
Last modified 8 weeks ago
#62483 reviewing enhancement
maybe_serialize() does support double serialization, but does not inform the developer if doing so
Reported by: |
|
Owned by: |
|
---|---|---|---|
Milestone: | 6.9 | Priority: | normal |
Severity: | normal | Version: | 3.6.1 |
Component: | General | Keywords: | has-patch |
Focuses: | Cc: |
Description
In #12930 the support of double serialization was added, in order not to break instances where developers used maybe_serialize()
wrong, I recently stumbled upon a case from 2023 where this also happened.
I suggest adding a _doing_it_wrong()
if serialized code was sent to maybe_serialize
in order to make less experienced developers aware of this.
Patch coming
Change History (18)
This ticket was mentioned in PR #7847 on WordPress/wordpress-develop by @apermo.
5 months ago
#1
- Keywords has-patch added
#2
@
3 months ago
- Milestone changed from Awaiting Review to 6.8
- Owner set to audrasjb
- Status changed from new to reviewing
Thanks for the PR, looks good to me at a glance.
I restarted the github actions to ensure phpunit tests are passing.
Moving to 6.8.
@audrasjb commented on PR #7847:
3 months ago
#3
@apermo PHP Unit Tests are failing, this changeset will require some test case update.
3 months ago
#4
@audrasjb I'll try to figure out how to fix them. In case I need help I'll reach out.
This ticket was mentioned in Slack in #core by audrasjb. View the logs.
2 months ago
#6
@
2 months ago
- Summary changed from maybe_serialize() does support double serialization, but does not the developer if doing so to maybe_serialize() does support double serialization, but does not inform the developer if doing so
2 months ago
#7
@audrasjb That recent push should fix all but one test error, and I'm unsure how to handle that one.
That is thrown in tests/phpunit/tests/privacy/wpPrivacyGeneratePersonalDataExportFile.php
test_doing_it_wrong_for_export_data_grouped_invalid_type()
The Data provider will return value that will trigger the double serialization.
https://github.com/WordPress/wordpress-develop/blob/12eaef1dd97ce46d7e665c69764cdae7387b03cc/tests/phpunit/tests/privacy/wpPrivacyGeneratePersonalDataExportFile.php#L237
My best guess would be to remove that single line, that did fix the remaining error locally.
It was added by @SergeyBiryukov in https://core.trac.wordpress.org/changeset/50613
Happy to discuss this, maybe Sergey can provide an idea.
This ticket was mentioned in Slack in #core by apermo. View the logs.
2 months ago
This ticket was mentioned in Slack in #core by audrasjb. View the logs.
8 weeks ago
#11
@
8 weeks ago
@apermo my concern here is that it's pretty difficult to anticipate how many plugins will throw a warning when WP_DEBUG is active. Maybe we should just leave it at its current state as double serialization doesn't seems to hurt anything?
#12
@
8 weeks ago
@audrasjb You've got a point, and I already thought about that as well. And you're kind of right that it doesn't break things. Yes.
We could argue that this should have been done 15 years ago when double serialization was allowed in the first place.
And while it does not break things, it's still done wrong. I'm not sure but plugins like Better Search Replace are expecting single serialization when performing a search and replace. I did not check what BSR would do about it.
I would point out, that this will only cause warnings for plugins and custom code by developers less experienced with WordPress.
Maybe we could get another opinion on that? I still think it is something that we should inform the developers about.
I already considered to ask Juliette if we can do a WPCS Sniff for that, but I'm pretty convinced that we could at best sniff for something like update_option( serialize( $array ) )
and that was not what I stumpled upon.
So the question is, do we want to inform them that they can skip double serialization, or not.
Could we merge it into beta, evaluating the feedback, and if there's negative Feedback, to pull it back before the final release?
Would that be a valid option?
#13
@
8 weeks ago
I suggest adding a _doing_it_wrong() if serialized code was sent to maybe_serialize in order to make less experienced developers aware of this.
Can you expand further? Why do you think this is wrong?
This ticket was mentioned in PR #8434 on WordPress/wordpress-develop by @apermo.
8 weeks ago
#14
This is an alternative approach to take care of 62483. Instead of adding a _doing_it_wrong()
like proposed in https://github.com/WordPress/wordpress-develop/pull/7847 this only adds an action, allowing professional developers and teams with a high requirement of standard and code quality to track double serialization.
Trac ticket: https://core.trac.wordpress.org/ticket/62483
#15
@
8 weeks ago
@TimothyBlynJacobs
Sure I'm happy to.
Well in the first place it was once disallowed 15 years ago, and only added back for backwards compatibility.
Double serialization does:
Increase complexity of code and data structures
Adding manual serialization or worse mixed use of double and single serialization will decrease code quality and can be a source of bugs. A custom sniff will likely not be possible to find this.
Risk of data corruption
Single serialized code is already a pain to read and while you should not manually "fix" serialized data in the database, we know that people will occasionally do so. And having to deal with double serialization will increase the odds of breaking sites.
Decrease performance
While this is out of the scope of WordPress core, and serializing a string will not be as "expensive" as serializing a complex array, it's still overhead. If developers properly use the auto-serialization WordPress does provide, they will save one serialize()
and unserialize()
plus get a very cheap early exit on is_serialized()
via if ( ! is_string( $data ) ) { return false; }
. This might not be much, but it still adds up.
There might be more reasons, but these are those best to grasp in my opinion.
In addition I've suggested a different approach to take care of this issue https://github.com/WordPress/wordpress-develop/pull/8434. Instead of adding a _doing_it_wrong()
we could as well add a add_action()
in the same spot.
This will still allow professionals to easily monitor and improve their code base and projects. As well as taking @audrasjb's concern into account, that bystanders will not be annoyed by warnings and prevent another "loading text domain in time" discussion.
All checks in the PR are green, if you guys think that is the better approach, I'm also happy with it. This will kind of fail the point of making less experienced developers aware, but still give professional teams a tool to increase their code quality.
#16
follow-up:
↓ 17
@
8 weeks ago
Thanks @apermo.
Yeah, so what you are outlining make sense for the cases where a developer is doing something like add_option( serialize() )
. However, it misses that double serialization is not just backward compatibility in the sense that if we remove it, plugins will break. But removing it also opens a security issue.
I agree, we should discourage developers from doing add_option( serialize() )
. This does seem like something that could mostly be accomplished by a sniff, however.
I definitely don't think we should _doing_it_wrong
, as it may end up with developers shooting themselves in the foot trying to remove double serialization, and introducing a security vulnerability into their code.
Firing an action is a bit better, but still seems heavy for code quality signal. I'm still worried about developers taking it the wrong way and trying to remove all instances of double serialization "to increase their code quality".
#17
in reply to:
↑ 16
@
8 weeks ago
Replying to TimothyBlynJacobs:
I agree, we should discourage developers from doing
add_option( serialize() )
. This does seem like something that could mostly be accomplished by a sniff, however.
The problem is, the case where I stumbled upon it, wasn't a simple nested function call.
It was rather similar to this:
<?php $serialized_data = serialize( $data ); //... update_post_meta( $post_id, 'some_key', $serialized_data );
And what I've learned from discussions with several members of the WPCS team, this is something that is not trivial to do.
Especially in a scenario like this:
<?php function my_store_alias( $data ) { update_post_meta( get_the_ID(), 'key', $data ); } my_store_alias( serialize( $data ) );
One example is the $wpdb->prepare()
sniff. It will only work if you nested it, if you move it out of your query function to improve readability, you'll end up with // phpcs:ignore WordPress.DB.PreparedSQL.NotPrepared -- $query already prepared.
to silence the sniff. I've double checked this exact example with Juliette a few weeks ago.
Replying to TimothyBlynJacobs:
Firing an action is a bit better, but still seems heavy for code quality signal.
I've had a thought about the performance. To better grasp my point made above I've added maybe_serialize()
(with my suggested action) and is_serialized()
here.
Case 1: We enter maybe_serialize()
with previously serialized data.
- It's a string, so we'll go by the first check.
- We go into
is_serialized()
- Depending on the exact string, we will go past 4-5 PHP core function calls, most likely including
preg_match()
once and 6+if
s and aswitch
statement. - Then we'll do my added
do_action()
Case 2: We enter maybe_serialize()
with unserialized data.
- If it's an array or object, we're already done and leave in the first condition.
- For strings and other trivial types, we'll enter
is_serialized()
- For anything but strings we'll leave here.
- No matter how the string is formed, at some point we'll leave the function, and my
do_action()
is never fired.
So if the developer uses WordPress data storage in the way it was meant to be, the action is never fired.
Replying to TimothyBlynJacobs:
I'm still worried about developers taking it the wrong way and trying to remove all instances of double serialization "to increase their code quality".
I think those developers who will find and use that action, will be capable enough not to break things. For those developers who might not even be aware of this issue, nothing will change.
(Further points below)
<?php /** * Serializes data, if needed. * * @since 2.0.5 * * @param string|array|object $data Data that might be serialized. * @return mixed A scalar data. */ function maybe_serialize( $data ) { if ( is_array( $data ) || is_object( $data ) ) { return serialize( $data ); } /* * Double serialization is required for backward compatibility. * See https://core.trac.wordpress.org/ticket/12930 * Also the world will end. See WP 3.6.1. */ if ( is_serialized( $data, false ) ) { /** * Allows the developer to track double serialization in their projects. * * While not broken, double serialization increases the complexity of data structures, * pose a performance overhead, and increases the cognitive load when reading code and data. * * @since 6.8.0 * * @param string $data Already serialized data. */ do_action( 'maybe_serialize_data_is_serialized', $data ); return serialize( $data ); } return $data; }
/**
* Checks value to find if it was serialized.
*
* If $data is not a string, then returned value will always be false.
* Serialized data is always a string.
*
* @since 2.0.5
* @since 6.1.0 Added Enum support.
*
* @param string $data Value to check to see if was serialized.
* @param bool $strict Optional. Whether to be strict about the end of the string. Default true.
* @return bool False if not serialized and true if it was.
*/
function is_serialized( $data, $strict = true ) {
// If it isn't a string, it isn't serialized.
if ( ! is_string( $data ) ) {
return false;
}
$data = trim( $data );
if ( 'N;' === $data ) {
return true;
}
if ( strlen( $data ) < 4 ) {
return false;
}
if ( ':' !== $data[1] ) {
return false;
}
if ( $strict ) {
$lastc = substr( $data, -1 );
if ( ';' !== $lastc && '}' !== $lastc ) {
return false;
}
} else {
$semicolon = strpos( $data, ';' );
$brace = strpos( $data, '}' );
// Either ; or } must exist.
if ( false === $semicolon && false === $brace ) {
return false;
}
// But neither must be in the first X characters.
if ( false !== $semicolon && $semicolon < 3 ) {
return false;
}
if ( false !== $brace && $brace < 4 ) {
return false;
}
}
$token = $data[0];
switch ( $token ) {
case 's':
if ( $strict ) {
if ( '"' !== substr( $data, -2, 1 ) ) {
return false;
}
} elseif ( ! str_contains( $data, '"' ) ) {
return false;
}
// Or else fall through.
case 'a':
case 'O':
case 'E':
return (bool) preg_match( "/^{$token}:[0-9]+:/s", $data );
case 'b':
case 'i':
case 'd':
$end = $strict ? '$' : '';
return (bool) preg_match( "/^{$token}:[0-9.E+-]+;$end/", $data );
}
return false;
}
I've also just found a wrong type hint in is_serialized()
, $data
should be mixed
and not string
, I've fixed it in both approaches.
And to anticipate:
Who will use it?
Well... I will, and I will spread the word to other professional teams/developers. So there won't be many from the start, but I have hopes that it will propagate and make people aware of this.
What other alternatives already exist to catch this?
We're talking about all meta tables and the option table.
So I could hook into update_{$meta_type}_metadata
(4x), added_{$meta_type}_meta
(4x), add_option
and update_option
to achieve the same, this will likely work, but it won't be as precise. Though arguable, it should be a dev environment only tool, so if my arguments did not convince you, yes I can work with that if I must.
Added _doing_it_wrong() to maybe_serialize to notify about double serialization.
Trac ticket: [](https://core.trac.wordpress.org/ticket/62483#ticket)