Make WordPress Core

Opened 2 years ago

Closed 15 months ago

Last modified 15 months ago

#55942 closed enhancement (maybelater)

Add a "value type" parameter to get_option() and get_metadata()

Reported by: azaozz's profile azaozz Owned by:
Milestone: Priority: normal
Severity: normal Version:
Component: Options, Meta APIs Keywords: 2nd-opinion
Focuses: performance Cc:

Description (last modified by azaozz)

Update: This ticket had to be somewhat repurposed as it became clear that with the initial approach there's no good way to ensure full backwards compatibility in get_option() and get_metadata().

The initial idea to support "strict" types of the options and meta values remains the same, but the implementation has changed. The new approach is to add another parameter to the above functions. This would ensure 100% backwards compatibility and at the same time will provide approximately the same benefits:

  • Consistent types of the return values from get_option(), get_metadata().
  • Reduced use of maybe_unserialize() and is_serialized().

The main differences are that the new approach is opt-in and doesn't require new table columns.

Attachments (1)

55942.diff (38.0 KB) - added by azaozz 15 months ago.
Latest patch.

Download all attachments as: .zip

Change History (88)

#1 @azaozz
2 years ago

There are (at least) three ways this could be implemented:

  1. The most basic: 0 would be default and undefined, 1 would mean the data was not serialized when storing it, 2 would mean the data was serialized.
  1. A better way would be to store the primitive (basic) PHP types: 0 would be default and unset, 1 would be boolean, 2 would be int, 3 would be float, 4 would be string, etc. (https://www.php.net/manual/en/language.types.intro.php). Then the data can be type-casted after retrieving it form the DB. This will also improve/fix one of the long-standing problems with saving a specific type to the DB but retrieving it as a different type, like saving bool false but getting string '0'.
  1. If a VARCHAR is used the new field can store the actual name of the type: bool, int, float, etc. and perhaps plugins would be able to add their own custom types. However not so sure that would be useful at such a low level, and it will increase the complexity (and perhaps the need for maintenance), and the overall tables size quite significantly.

Moving this to the 6.1 milestone for consideration.

Last edited 2 years ago by azaozz (previous) (diff)

#2 @azaozz
2 years ago

  • Milestone changed from Future Release to 6.1

#3 follow-up: @barry.hughes
2 years ago

A possible fourth way (something of a hybrid between the options you listed above) would be an enumeration.

`type` ENUM( 'int', 'float', 'string', 'bool', 'php', 'json' )

In the above example, we have four scalar types and two serialization types—PHP (default/what we use today) and JSON. Introducing JSON as a supported serialization method:

  • Could help to mitigate the risk of object injection vulnerabilities
  • In many cases, will consume less storage space (than serialized PHP, because less verbose)
  • In some cases, may even perform a little better than serialized PHP

Some enum pros and cons:

  • Still makes for readable queries
  • Doesn't occupy much space (just 1 byte if we keep the number of enum values fairly small)
  • Less flexible than a varchar field

This ticket was mentioned in Slack in #core by azaozz. View the logs.


2 years ago

#5 @davidbaumwald
2 years ago

I too was thinking of ENUM, if we can swing it, because it indexes much better.

#6 @azaozz
2 years ago

  • Keywords 2nd-opinion added

Yes, ENUM seems best. Just wondering if we should implement the MySQL built-in ENUM() or do it "by hand" by adding a TINYINT and mapping the index numbers with PHP constants.

Advantages of using ENUM:

  • Wouldn't need additional processing, like mapping index numbers to names ('int', 'float', 'string', etc.). No need of PHP constants to help there.
  • Better/readable display when looking directly in the DB (like with phpMyAdmin).
  • Pretty straightforward to add.

Main disadvantages of using ENUM:

  • In addition to the defined values there always are '' (mapped to 0) and NULL. That may be somewhat confusing.
  • A bit tricky ORDER BY. Sorting is by index, not by name. That index depends on the order in which the values were defined. Need to use CAST( ... as CHAR ) to sort by name.
  • Adding more values would require altering the tables.
  • Harder to list all possible values, need to look at INFORMATION_SCHEMA.COLUMNS.
  • Perhaps not particularly relevant but ENUM is a MySQL extension, not a standard ANSI SQL.

Re-reading the above, I'm 50/50 which method would be better. Not sure if the drawbacks of using ENUM are that significant. On the other hand using TINYINT and keeping the mapping outside of the DB gives us full control, even plugins would be able to add to that mapping, eventually.

So, what say you? To ENUM or not to ENUM? :)

#7 follow-up: @barry.hughes
2 years ago

Also a little on the fence, though I think I'm still leaning toward enums (memory efficient, good query readability, stricter than a plain integer).

In addition to the defined values there always are (mapped to 0) and NULL. That may be somewhat confusing.

I think one part of this can be avoided completely, by making the column non-nullable.

On the other hand using TINYINT and keeping the mapping outside of the DB gives us full control, even plugins would be able to add to that mapping, eventually.

This is an interesting question, since if we want plugins to be able to add to the mapping we also need to build up systems around that to help avoid collisions, etc (and that leads me to think that perhaps neither TINYINT nor ENUM would be a good fit).

Though ... I'm a little unsure if we actually *do* want plugins to be able to interact at this level (or, indeed, if there are plugin developers that see potential value in being able to do so) 🤔

This ticket was mentioned in Slack in #core by azaozz. View the logs.


2 years ago

#9 follow-up: @petitphp
2 years ago

Sorry for the silly question, this new field would be used when reading the meta/option value to ensure we return the same type as the one we received when saving the value ?

#10 in reply to: ↑ 9 @azaozz
2 years ago

Replying to petitphp:

Yes, that's the general idea. In addition this will simplify the current code, and remove the need for using maybe_unserialize() and is_serialized() when retrieving newer data from these tables.

Last edited 2 years ago by azaozz (previous) (diff)

#11 in reply to: ↑ 7 @azaozz
2 years ago

Replying to barry.hughes:

Also a little on the fence, though I think I'm still leaning toward enums (memory efficient, good query readability, stricter than a plain integer).

Yep, thinking more about it, lets go with the built-in ENUM(). Nothing beats simplicity :)

In addition to the defined values there always are (mapped to 0) and NULL. That may be somewhat confusing.

I think one part of this can be avoided completely, by making the column non-nullable.

You're right. Thinking it may still be needed though, depends on how the patch is going to handle stuff like add_option( 'my_option', NULL ) and what would be the expected return there considering back-compat.

Though ... I'm a little unsure if we actually *do* want plugins to be able to interact at this level

Yeah, perhaps better to not let plugins interact there. If any interaction is needed it can probably be done at a higher level, i.e. make an associative array or an object, add all info (including any meta that may be needed), then encode and pass to the API as a string.

Last edited 2 years ago by azaozz (previous) (diff)

#12 @azaozz
2 years ago

  • Keywords has-patch needs-unit-tests added; 2nd-opinion removed

#13 @azaozz
2 years ago

  • Keywords needs-patch added; has-patch removed

#14 in reply to: ↑ 3 @azaozz
2 years ago

Replying to barry.hughes:

`type` ENUM( 'int', 'float', 'string', 'bool', 'php', 'json' )

In the above example, we have four scalar types and two serialization types—PHP (default/what we use today) and JSON. Introducing JSON as a supported serialization method:

I'm a bit "uneasy" specifying "json" as type in that way. Thinking the ENUM should stick to the PHP's primitive types, and use PHP's serialize() for encoding (same as currently).

If a plugin wants to use JSON, it would be pretty easy to encode the value before passing it to the API. Then the plugins will "know" to decode it after retrieving it. Support for both encoding methods will be nice, but don't see a good way to add it without adding another field to the table.

Looking at gettype() (https://www.php.net/manual/en/function.gettype.php) there are 10 possible return values:

  • "boolean"
  • "integer"
  • "double" (the old name of "float"; best to use float here imho)
  • "string"
  • "array"
  • "object"
  • "resource"
  • "resource (closed)" as of PHP 7.2.0
  • "NULL"
  • "unknown type"

Thinking that the default value for the ENUM should be "unknown type". This will signify that we will need to run maybe_unserialize() on the data (back-compat). Can probably be named just "unknown". Was thinking it can probably use the empty string there, but that would make it somewhat unclear when looking at the DB.

Wondering if NULL (the PHP type) should map to "NULL" in the ENUM. Seems logical but might run into some problems later on, perhaps?

Thinking this should not support "resource" and "resource (closed)". Currently not supported afaik, and not sure it's a good idea to save a PHP resource in the DB (not even sure how/if that can still be used somehow after retrieving it).

The rest are pretty self-explanatory. Array and Object will be serialized, the others can be typecast when retrieving, or it can use settype() (https://www.php.net/manual/en/function.settype.php).

Last edited 2 years ago by azaozz (previous) (diff)

#15 follow-up: @petitphp
2 years ago

I can work on this ticket if no ones is already on it ?

#16 follow-up: @barry.hughes
2 years ago

I'm a bit "uneasy" specifying "json" as type in that way.

To offer an alternative stance: serialization is pretttty useful, and of course we aren't getting rid of it with this proposal.

Native JSON support simply means there would be support for a safer (but also more limited) form of serialization with different 'performance' characteristics (I'm using that a little loosely to mean not just runtime costs, but space requirements -- JSON tends to be more compact).

The big win here in my opinion though is safety (object injection not being possible with JSON), and I think there is value in WordPress offering that safer path.

That said...

If a plugin wants to use JSON, it would be pretty easy to encode the value before passing it to the API.

Yep, also true, and it doesn't close off a future where some kind of wrapper for JSON-encoded options is supported by WP (out of scope here).

Wondering if NULL (the PHP type) should map to "NULL" in the ENUM. Seems logical but might run into some problems later on, perhaps?

Less sure about this.

We could be visualizing different parts of the problem but, from a back-compat perspective, I'd have thought null (PHP) would map to 'original Options API behaviour', in essence ?

#17 in reply to: ↑ 16 @azaozz
2 years ago

Replying to barry.hughes:

Native JSON support simply means there would be support for a safer (but also more limited) form of serialization with different 'performance' characteristics (I'm using that a little loosely to mean not just runtime costs, but space requirements -- JSON tends to be more compact).

The big win here in my opinion though is safety (object injection not being possible with JSON), and I think there is value in WordPress offering that safer path.

Yep, I agree. I'm only uneasy about adding that support at the same place/same level where PHP types are added/defined.

If a plugin wants to use JSON, it would be pretty easy to encode the value before passing it to the API.

Yep, also true, and it doesn't close off a future where some kind of wrapper for JSON-encoded options is supported by WP (out of scope here).

Actually... Why not add such wrapper/helper functions now.

As far as I imagine it, it would only accept arrays. Then JSON encode them, and save as strings. Then the corresponding method will JSON decode the string from the DB. The values will be PHP arrays after decoding. Think that would work quite well :)

from a back-compat perspective, I'd have thought null (PHP) would map to 'original Options API behaviour'

Yes, implementing this enhancement will change the returned type and in some cases the returned value. For example saving true as option value would currently return string "1", but not if the option is checked immediately after adding. Then it would return the proper type.

Been thinking how to handle the back-compat here, may turn out to be more problematic than expected. Seems in the worst case scenario we can add another param to get_option(), etc. functions to specify whether to use strict types. (I'd hate to do that but seems it may be needed). Lets look in this again after there is a patch and tests :)

#18 in reply to: ↑ 15 @azaozz
2 years ago

Replying to petitphp:

I can work on this ticket...

Sure, feel free to start. Then @barryhughes, I, other contributors can review and/or expand as needed. There are several functions that should be used, like for example when altering the tables. Also several functions would have to be updated, and few more added (see above about using JSON). I'd imagine the new wrappers would follow similar code/solution from elsewhere in WP.

Last edited 2 years ago by azaozz (previous) (diff)

This ticket was mentioned in PR #3124 on WordPress/wordpress-develop by petitphp.


2 years ago
#19

  • Keywords has-patch has-unit-tests added; needs-unit-tests needs-patch removed

#20 @davidbaumwald
2 years ago

  • Focuses performance added

Tagging performance so the Performance Team can hopefully engage here as well.

#21 @desrosj
2 years ago

  • Keywords needs-testing added
  • Milestone changed from 6.1 to Future Release

I'm thinking this will likely need more time for review and testing. With Beta 1 in less than 12 hours, I'm going to punt for now.

If a committer is confident enough in this to own for 6.1 before beta 1, it can be moved back and tackled.

#22 follow-up: @flixos90
2 years ago

I love the idea of this ticket, but I'm wary about how we can solve it in a backward compatible way while also not making the API super clunky. To me, the main concern here is that plugins today may to do something like get_option( 'some_boolean_ish_option' ) === '1', which could break following this change. To be clear, I would discourage anyone from writing code like that, but it's probably out there :)

Similarly, for integer and float comparison, while it's already encouraged to always type-cast get_option() results, there are probably some plugins today that use either loose comparison (discouraged) or explicitly compare with a string (also discouraged), like get_option( 'some_number' ) === '42'.

@azaozz Is your idea that the new type column value would be set based on the data passed to add_option() / update_option(), or based on the registered type of only registered settings?

I wonder whether something like the latter could be used to make this more of an opt-in thing. Or alternatively we can see about a slow rollout, where we post about this in advance, potentially even send something about it in an email to plugin developers, and/or build checks to catch problems like the above into WPCS. Some of this may be a stretch, but let's see :)

#23 in reply to: ↑ 22 @azaozz
2 years ago

@petitphp thanks for the patch! Looks pretty good :) Thinking perhaps there are too many filters, like the one on the default types, but lets talk about that on GH.

Replying to flixos90:

Is your idea that the new type column value would be set based on the data passed to add_option() / update_option(), or based on the registered type of only registered settings?

Yeah, been (slowly) testing this and thinking how to handle back-compat. Still exploring what would be best. You're right, back-compat is pretty hard here.

plugins today may to do something like get_option( 'some_boolean_ish_option' ) === '1', which could break following this change.

Right, and it is even more complicated. Currently if a plugin does add_option( 'my_option', true ) and then a bit later during the same run does get_option( 'my_option' ) the returned value will be true as new and updated options are cached. I.e. get_option() will return the proper type and value as passed to add_option(). However if it does get_option( 'my_option' ) on the next run, the returned value will be a string '1' the way it comes out of the DB.

At the same time the biggest win here is to stop using is_serialized() and maybe_unserialize() for arrays and objects. Thinking when an array or object is passed to add/update_option or add/update_meta WP should automatically set the type. Then on retrieving options and meta it should unserialize the values. There won't be any back-compat problems there as this is how it works now.

However for boolean and numeric values it would most likely have to be opt-in. Even looking at the patch, the places where '1' was replaces with 1, etc. may cause problems in plugins (as many just copy code from core).

In any case, lets try to figure out the back-compat during 6.2 and add this. It would be a pretty good enhancement.

#24 @azaozz
2 years ago

Another option here (which I don't particularly like, but is a possible solution) would be to only add a TINYINT field to these tables indicating whether the data in the value field is serialized or not. That perhaps can be 0 for default/unknown, 1 for serialized, and maybe 2 for JSON encoded. This will reduce complexity and won't have back-compat problems, but will also miss the chance to further improve the options and meta APIs.

Last edited 2 years ago by azaozz (previous) (diff)

#25 follow-up: @barry.hughes
2 years ago

What do we think about the (optional) ability to specify the underlying type, on read as well as upon write?

For instance, to maintain full backwards compatibility, there are surely going to be cases where PHP serialization is still required. However, as a plugin author, if I can also specify (upon read) that I expect the value to be a simple boolean, or to have been written using JSON serialization, that would close off scenarios where third party code replaces an option with a value requiring PHP serialization (because what would be less desirable in such a case is if WP goes ahead and uses unserialize() when reading that data, contrary to my intentions).

Last edited 2 years ago by barry.hughes (previous) (diff)

#26 follow-up: @petitphp
2 years ago

We already have a way to define the type of data when using register_meta/register_term_meta/register_settings. If i'm not mistaken, this is currently only used when preparing output for the REST API. We could leverage this to handle saving/retrieving data from the database ? This would ensure existing metas/options would keep the current behavior unless the dev has specified a type.

#27 follow-up: @rjasdfiii
22 months ago

I have been helping programmers and DBAs with MySQL performance for two decades. A common design pattern is "Entity-Attribute-Value" (EAV), such as what WP uses for its various 'meta' tables. EAV, and the lack of good indexing, is a big performance issue _today_ for WP.

Most designers who discover (or re-invent) EAV come up with something like what WP has, but some carry the design pattern a step further:

  • Normalize the various keys and/or values -- I explain to them how much worse this is because of having to do an extra hop to get to the actual values.
  • Add a datatype to the "value". This helps with numeric comparisons. The basic EAV fails to provide a simple way to compare numeric values, and they end up with "10" < "2". Doing CASTs is a clumsy workaround.
  • Multiple columns -- one each for strings, floats, integers, dates, etc.

For the last two designs, the user has already implemented it and is stumbling badly over performance issues. I must tell them to rip out that embellishment and go to the basic EAV.

So, I am telling you that your design, though inspired, will be a nightmare of coding clumsiness and performance degradation.

_Be aware that almost any embellishment in this area will lead to the inability to use an INDEX_ That, alone, dooms any performance. I claim that postmeta is already in deep-dodo because of performance. This plugin helps by improving the INDEXes: https://wordpress.org/plugins/index-wp-mysql-for-speed/ but leaves intact the inherent design problems of EAV.

#28 in reply to: ↑ 27 @azaozz
20 months ago

Replying to rjasdfiii:

EAV, and the lack of good indexing, is a big performance issue _today_ for WP.

Perhaps you mean the lack of fulltext indexes for meta and option value? This seems to be discussed at length in few places, like: https://github.com/WordPress/performance/issues/132. Don't think this ticket has anything to do with DB indexes.

So, I am telling you that your design, though inspired, will be a nightmare of coding clumsiness and performance degradation.

_Be aware that almost any embellishment in this area will lead to the inability to use an INDEX_ That, alone, dooms any performance.

Perhaps there is a bit of misunderstanding here? This ticket is not about changing any fields in the DB. It is only about adding another column that would hold the PHP type of the data. This will significantly reduce use of functions like maybe_unserialize(). That's all.

Also storing the (PHP) type may make it easier to search for specific data, i.e. can limit the searches by data type.

I claim that postmeta is already in deep-dodo because of performance. This plugin helps by...

Again, nothing to do with this ticket :)

#29 in reply to: ↑ 26 @azaozz
19 months ago

Replying to petitphp:

We already have a way to define the type of data when using register_meta/register_term_meta/register_setting.

Right, although it's a bit "weird" :)
Apparently the data types in the REST API are defined as 'string', 'boolean', 'integer', 'number', 'array', and 'object', see https://core.trac.wordpress.org/changeset/39222#file1, [40603], and [46186]. Then the data types for register_meta() were copied, and ended up being a bit incompatible with PHP.

A number is not a valid data type in PHP, seems it corresponds to float, a.k.a. double.

We could leverage this to handle saving/retrieving data from the database

Yep, should be able to. Just need to add some code to "juggle" number <=> float for the REST API.

Version 0, edited 19 months ago by azaozz (next)

#30 in reply to: ↑ 25 @azaozz
19 months ago

Replying to barry.hughes:

What do we think about the (optional) ability to specify the underlying type, on read as well as upon write?

Thinking the simpler/more straightforward solution would be to set the type automatically depending on the data. For example doing update_option( 'my_option', 123 ) would insert the 123 in the DB and set the type to integer. Then doing get_option( 'my_option' ) would return the DB value unchanged, i.e. a string '123' for back-compat, and doing get_option( 'my_option', $default_value, 'strict' ) would return int 123.

that would close off scenarios where third party code replaces an option with a value requiring PHP serialization...

Serialization is (just a) DB storage requirement/method. It should happen automatically when needed. When the changes here are implemented all new and updated options and meta values will automatically have data types. If the data type is array or object, the DB output will be unserialized and returned. As data types will be added automatically on saving to the DB, there will be no errors (unless someone "hacks" the values directly in the database).

Last edited 19 months ago by azaozz (previous) (diff)

#31 @barry.hughes
19 months ago

Unless someone "hacks" the values directly in the database.

This implies (to me) that the only way this system can be compromised is when a bad actor anyway has unfettered access to the server, or to the database, and if that's the case all hope is lost ... but I don't think that is necessarily true. It's not inconceivable that a plugin could be 'tricked' into serializing a complex type, and attacks may be chained across multiple plugins. However, I understand this could be too theoretical or simply out of scope.

Thinking the simpler/more straightforward solution would be to set the type automatically depending on the data.

💯

Yes, that is a straightforward approach and would still represent a nice iteration on the Options API.

#32 @azaozz
18 months ago

(Finally) was able to spend more time on this. Went through all needed changed back and forth couple of times. Backwards compatibility is really "tough" here, not just for the code but for the current vs. updated behavior too.

Unfortunately seems there's no good, straightforward way to implement this as a "progressive enhancement". The only possibility seems to be to add another param to update_option() and update_metadata() to indicate strict value types. This will still require changes to the caching which might bring problems if a plugin tries to access the cache buckets directly.

The worse part is that it would "split" get_option() and get_metadata() return values in two groups:

  1. "Old" values that don't have proper types that were added or updated before this was implemented or saved without strict types.
  2. "New" values that have correct types that were added or updated with strict types.

That seems confusing.

Also tested with adding "strict types" param to get_option() and get_metadata(), however that makes them even more unpredictable. Old values cannot be returned with proper types as that data will not be in the DB.

So thinking the best way to implement this would be to add another optional param to get_option() and get_metadata() (and similar) with the expected value type. That should be 100% backwards compatible and will still ensure values with the proper types are returned (when expected) and reduce calls to maybe_unserialize(), etc. This also means no need for new columns in the meta and options tables.

Patch coming up.

#33 follow-up: @rjasdfiii
18 months ago

What happens when a user fetches posts based on a postmeta value that needs to be numeric (int or float)? This is one of the more clumsy and inefficient tasks today; what will the SQL code look like with your changes?

#35 in reply to: ↑ 33 @azaozz
18 months ago

Replying to rjasdfiii:

What happens when a user fetches posts based on a postmeta value that needs to be numeric

Assuming you're first getting the meta from PHP, see the change to get_metadata() . Can use the new $value_type param to specify the returned value's type.

#36 @azaozz
18 months ago

  • Milestone changed from Future Release to 6.3

#37 follow-up: @rjasdfiii
18 months ago

@azaozz - Does wp_decode_value_from_db() fetch all the meta data for a single post_id and store it somewhere, then pick it apart, testing/converting according to datatype?

What would the generated SQL be for "Find all posts where the meta value 'price' is between 50 and 200"? I am worried that it needs to fetch at least all the key=price values from postmeta, then convert to float, and only then do the comparisons. Performance-wise this is equivalent to the current kludge involving CAST..TO FLOAT built into the SELECT's WHERE clause. If there are a million posts, there will be on the order of a million fetches from postmeta. And _that_ is the most costly part of the query.

#38 in reply to: ↑ 37 @azaozz
18 months ago

Replying to rjasdfiii:

Does wp_decode_value_from_db() fetch all the meta data for a single post_id and store it somewhere

No, sorry, should have explained better. The second PR drops the changes to the DB completely as it seems there's no good way to ensure full backwards compatibility in get_option() and get_metadata(). There are no changes to any DB queries, etc.

wp_decode_value_from_db() is a simple function intended for use instead of maybe_unserialize() on option and meta values retrieved from the database. It takes into account the $value_type parameter (if set).

@azaozz commented on PR #4324:


18 months ago
#39

This is a more simpler and straightforward approach

Yea, unfortunately it seems very complex to implement the original idea and keep 100% back-compat. So this PR is a "cut-down" version without any DB changes that still brings some of the original benefits.

One thing I've noted, the get_{post|term|user}_meta functions have not been updated

Updated them. Next would be to add more tests and also some existing get_{post|term|user|comment}_meta tests to include the new param.

#40 follow-up: @OllieJones
18 months ago

A couple of points to consider:

  1. Avoid ENUMs. Altering tables to add values to them is unreasonably expensive.
  1. Don't forget about the availability of [igbinary_serialize()](https://www.php.net/manual/en/function.igbinary-serialize.php) and its unserialize counterpart. It's not on all php systems. But when it is, it saves lots of space. Of course using it makes it impossible to use LIKE to search metavalues, like the All Users panel does. https://github.com/WordPress/wordpress-develop/blob/6.2/src/wp-includes/class-wp-user-query.php#L517

@azaozz commented on PR #4324:


18 months ago
#41

Been wondering if it is a good idea to enforce the $value_type to the values returned from the "pre" filters: pre_option_{$option}, pre_option, get_{$meta_type}_metadata.

Seems the type should be enforced when used as the function that is getting the option or the meta is specifying the value type. Perhaps that may bring some unexpected results sometimes. However not enforcing the type also may bring unexpected results.

Any ideas or thoughts about this? :)

@petitphp commented on PR #4324:


18 months ago
#42

Been wondering if it is a good idea to enforce the $value_type to the values returned from the "pre" filters: pre_option_{$option}, pre_option, get_{$meta_type}_metadata.

Seems the type should be enforced when used as the function that is getting the option or the meta is specifying the >value type. Perhaps that may bring some unexpected results sometimes. However not enforcing the type also may bring >unexpected results.

I agree with this. When I specify a value type the function should try to always return the expected type even if the value come from a filter.

#43 in reply to: ↑ 40 @azaozz
18 months ago

Replying to OllieJones:

  1. Avoid ENUMs. Altering tables to add values to them is unreasonably expensive.

Yea, ENUMs are very expensive to update. It made sense to try using ENUM here as the list of values (PHP data types) is very unlikely to ever change.

  1. Don't forget about the availability of [igbinary_serialize()](https://www.php.net/manual/en/function.igbinary-serialize.php)

Not sure if WP can use igbinary_serialize() as meta and option values are LONGTEXT. The PHP manual specifically mentions this.

For example, igbinary_serialize() output should generally be stored in a BLOB field in a database, rather than a CHAR or TEXT field.

Having said that, the scope of this ticket had to be reduced. The latest patch doesn't add new columns to the DB as there's no good way to ensure full backwards compatibility in get_option() and get_metadata(). I'll change the ticket's title and description accordingly.

#44 @azaozz
18 months ago

  • Component changed from Database to Options, Meta APIs
  • Description modified (diff)
  • Summary changed from Add a "type" field to the meta and options tables to Add a "value type" parameter to get_option() and get_metadata()

#45 follow-up: @spacedmonkey
18 months ago

  • Component changed from Options, Meta APIs to Database
  • Description modified (diff)
  • Summary changed from Add a "value type" parameter to get_option() and get_metadata() to Add a "type" field to the meta and options tables

I would strongly recommend using register_meta/register_setting here. This requires developers to opt-in to this functionality and we can even add a new parameter to other, like strict typing, that forces typing.

Changing the database schema for post meta, user meta, term meta, comment meta, site meta, network options and options, is a large database change. I will likely break lots of migration scripts and even plugins like woocommence do raw database queries on the meta tables.

I do like the idea of the ticket, skipping serialize could be a massive win.

#46 @azaozz
18 months ago

  • Component changed from Database to Options, Meta APIs
  • Description modified (diff)
  • Summary changed from Add a "type" field to the meta and options tables to Add a "value type" parameter to get_option() and get_metadata()

#47 in reply to: ↑ 45 @azaozz
18 months ago

Replying to spacedmonkey:

Ummm, did you revert the updated title and description accidentally?

I would strongly recommend using register_meta/register_setting here.

Unfortunately both register_meta() and register_setting() are targeting only the REST API and aren't particularly useful outside of it. Both require a lot of data/code/settings that don't make sense in many cases.

Changing the database schema for...

The database is not changed. Please see the updated title and description.

Last edited 18 months ago by azaozz (previous) (diff)

#48 @azaozz
18 months ago

  • Description modified (diff)

#49 follow-up: @knutsp
18 months ago

I suggest adding 'bool' as alias of 'boolean' and 'int' as alias of 'integer' for $value_type just before calling wp_get_db_value_types(), by something as simple as:

$value_type = $value_type === 'bool' ? 'boolean' : $value_type;
$value_type = $value_type === 'int'  ? 'integer' : $value_type;

because this gives compliance with the types used in PHP for type hints and checking of function parametres, return values and properties, and in docblocks. Other reasons:

  1. PHP developers tends to intuitively expect those shorter type aliases to work
  2. Not having to check the documentation again just because one may have forgot which "version" of types are expected by WP
  3. In line with WP_Query, get_user_by() and get_term_by() in respect to 'ID' versus 'id' or the exact column name
  4. In line with Robustness principle https://en.wikipedia.org/wiki/Robustness_principle "be conservative in what you send, be liberal in what you accept"

#50 follow-up: @spacedmonkey
18 months ago

Unfortunately both register_meta() and register_setting() are targeting only the REST API and aren't particularly useful outside of it. Both require a lot of data/code/settings that don't make sense in many cases.

Register meta existing before the REST API and adding support for it was only added in 4.9.8. I think these functions were designed to do exactly this. They are currently used only for the REST API, but this another perfect use for them.

The fact you can define a type for settings and meta and that currently is not respected outside of REST API, is a bug, IMO.

#51 in reply to: ↑ 49 @azaozz
18 months ago

Replying to knutsp:

I suggest adding 'bool' as alias of 'boolean' and 'int' as alias of 'integer' for $value_type

Yea, was wondering if it's worth supporting these shorthand "type names". Generally wp_get_db_value_types() is modeled after PHP's settype(), so maybe it makes sense to allow the short names and not be too "strict". We don't even have to add "translations" as settype() accepts bool and int, see https://www.php.net/manual/en/function.settype.php.

Not so sure about "double" (the old name for a decimal number). Also note that the REST API doesn't understand "float" and uses "number" instead. Unfortunately that is also the case for register_meta() and register_setting() :(

#52 in reply to: ↑ 50 ; follow-up: @azaozz
18 months ago

Replying to spacedmonkey:

Register meta existing before the REST API

Right, but there was a pretty major upgrade for it in WP 5.5 that changed a lot of stuff (and made some weird assumptions), see #43941.

I think these functions were designed to do exactly this. They are currently used only for the REST API, but this another perfect use for them.

Yep, and with updating the underlying (lower level) meta and options APIs, this becomes even easier.

Unfortunately when register_meta() and register_setting() were added or updated to support the REST API, they were made slightly incompatible with the PHP data types. There is no number type in PHP. There will have to be some "type name translation".

Ideally that translation should have happened in the REST API, however the register_meta() and register_setting() also accept that type, so... Not sure what would be good here.

#53 follow-up: @knutsp
18 months ago

'number' should be treated as union type int|float (PHP 8.0) and pass the is_numeric() test. Both register_[setting|meta]() and typed [get|update]_[option|meta]() could be enhanced support both 'number', 'int[eger]' and 'float'.

We do PHP, HTML and JavaScript. HTML has types 'number', 'text', 'url, 'email' and more. Some are kind of subtypes, and 'number' can be seen as a union type. Sometimes we need to "translate" between different languages/systems and contexts. We should try do that in a logical way, building the bridges.

The register functions is for full and flexible sanitation and validation, including deep validation of arrays and objects, even pattern matching. This typed meta proposal, on the other hand, is far much simpler and focuses on PHP typing only, for performance and to avoid simple type errors, and paves the way to more or full strict typing overall. The two methods could live side by side, but we should strive to make them coherent, so that when both is active, no contradictions occur.

In all languages, and many systems, there are different ways of doing (almost) the same thing. Hard to avoid and not necesarily so bad.

#54 follow-up: @rjasdfiii
18 months ago

Be aware that MySQL has a "DECIMAL(m,n)" format for things such as currency, but PHP simply throws them into "DOUBLE". This can lead to roundoff errors and other rare monetary errors.

MySQL implements DECIMAL using a general decimal package that handles up to ~65 decimal digits.

About 40 years ago, IEEE-754 standardized binary FLOAT and DOUBLE. Nearly every computer hardware and software manufacturer since then has adopted these (in addition to other numeric types). Deviating from that is likely only to confuse.

"1.23" can be represented exactly in MySQL's DECIMAL(m,2). With anyone's FLOAT, it can be represented only to 24 significant bits. Or 53 for DOUBLE. Those map to approximately 7 and 16 significant digits.

"Serialization", if done "correctly", converts the binary numbers into enough decimal digits so that the subsequent unserialization will produce exactly the orignal binary value.

Sorry, I don't have a clean solution of how to blend IEEE/PHP/MySQL/WP when it comes to representing exact monetary value.

#55 in reply to: ↑ 54 @Otto42
18 months ago

Replying to rjasdfiii:

Be aware that MySQL has a "DECIMAL(m,n)" format for things such as currency, but PHP simply throws them into "DOUBLE". This can lead to roundoff errors and other rare monetary errors.

I don't believe we are using a decimal column anywhere in WordPress. So such a thing would likely be for completionist only.

However, the solutions I've seen to this, that do not involve weird or extra type systems, have usually been to represent the value as a string. This generally avoids people doing math with them, essentially. Which is where the problem would come in to messing with currency in such a manner.

#56 follow-up: @rjasdfiii
18 months ago

This generally avoids people doing math with them, essentially.

Storing numbers in strings makes them difficult to compare with '>' and '<'. This leads to cumbersome and slow use of CAST in comparisons.

#57 in reply to: ↑ 56 @Otto42
18 months ago

Replying to rjasdfiii:

Storing numbers in strings makes them difficult to compare with '>' and '<'. This leads to cumbersome and slow use of CAST in comparisons.

PHP strings, decimal columns. And again this probably is not going to matter unless we are trying to specifically handle a column type we don't use.

#58 in reply to: ↑ 53 @azaozz
18 months ago

Replying to knutsp:

'number' should be treated as union type int|float (PHP 8.0) and pass the is_numeric() test.

The problem here is how to match PHP data types to REST data types (as implemented in WP). Seems REST's number should match PHP's float. So when retrieving registered meta of type number, it will have to do get_metadata( $meta_type, $object_id, $meta_key, $single, 'float' ); at the lower level.

(It's a bit confusing that the REST data types (as used in WP) seem to come from JSON data types, however JSON doesn't have integer type according to RFC 8259.)

Both register_[setting|meta]() and typed [get|update]_[option|meta]() could be enhanced support both 'number', 'int[eger]' and 'float'.

Thinking it would perhaps be better to keep it simple and limit the "data type translation" to one place. I.e. replace number with float in registered settings and registered meta when retrieving the values.

The register functions are for full and flexible sanitation and validation...

Yes, and the possibility to retrieve strict data types from get_option() and get_metadata() would compliment that.

Last edited 18 months ago by azaozz (previous) (diff)

#59 @azaozz
18 months ago

@rjasdfiii @Otto42 Thanks for the explanation of the MySQL DECIMAL format. As @Otto42 mentions I don't think there is a column in any of the default WP tables that is using or may benefit from it.

There are several that are LONGTEXT, like option and meta values, but they are designed to hold different types of data. Hence the use of PHP's serialize() and the need of this ticket to make it possible to retrieve strict data types for these values.

Last edited 18 months ago by azaozz (previous) (diff)

#60 follow-up: @rjasdfiii
18 months ago

Much of this discussion has been techy. Please back up for a minute and provide a couple of real-life examples of an applications and how an end user would code this both before and after "value type" was added. Think ahead to examples that might goe into the WP (or Woo, etc) reference manual.

#61 in reply to: ↑ 60 ; follow-up: @azaozz
18 months ago

Replying to rjasdfiii:

provide a couple of real-life examples

Yep, good idea. There should be a dev. note around beta1 time too but couple of examples here would be nice.

The (new) $value_type parameter in get_option(), get_metadata(), and related functions can be used to get the value with a specified type. Generally that type should match the type of the data when the option or meta was saved, however that is not compulsory. For example:

  1. add_option( 'my_option', true ); then:
    • get_option( 'my_option' ); will return string '1'. This is the current behavior.
    • get_option( 'my_option', false, 'bool' ); will return boolean true.
    • get_option( 'my_option', false, 'int' ); will return integer 1.
  2. add_option( 'my_option', 0 ); then:
    • get_option( 'my_option' ); will return string '0'. This is the current behavior.
    • get_option( 'my_option', false, 'bool' ); will return boolean false.
    • get_option( 'my_option', false, 'int' ); will return integer 0.
  3. add_option( 'my_option', 'some text' ); then:
    • get_option( 'my_option' ); will return string 'some text'. This is the current behavior.
    • get_option( 'my_option', false, 'bool' ); will return boolean true.
    • get_option( 'my_option', false, 'int' ); will return integer 0.

In practice using $value_type would make it possible to do something like:

add_option( 'my_option', true );
...
if ( true === get_option( 'my_option', false, 'bool' ) ) { 
   ...
}
add_option( 'my_option', 4.7 );
...
if ( get_option( 'my_option', false, 'int' ) > 5 ) {
    ...
}

if ( get_option( 'my_option', false, 'float' ) > 4.5 ) {
    ...
}
Last edited 18 months ago by azaozz (previous) (diff)

#62 in reply to: ↑ 61 ; follow-up: @knutsp
18 months ago

Replying to azaozz:

The (new) $value_type parameter in get_option(), get_metadata(), and related functions can be used to get the value with a specified type. Generally that type should match the type of the data when the option or meta was saved, however that is not compulsory. For example:

  1. add_option( 'my_option', true ); then:
    • get_option( 'my_option' ); will return string '1'. This is the current behavior.
    • get_option( 'my_option', false, 'bool' ); will return boolean true.
    • get_option( 'my_option', false, 'int' ); will return integer 1.
  2. add_option( 'my_option', 0 ); then:
    • get_option( 'my_option' ); will return string '0'. This is the current behavior.
    • get_option( 'my_option', false, 'bool' ); will return boolean false.
    • get_option( 'my_option', false, 'int' ); will return integer 0.
  3. add_option( 'my_option', 'some text' ); then:
    • get_option( 'my_option' ); will return string 'some text'. This is the current behavior.
    • get_option( 'my_option', false, 'bool' ); will return boolean true.
    • get_option( 'my_option', false, 'int' ); will return integer 0.

What will be the difference to writing
(string) get_option( 'my_option' );
(bool) get_option( 'my_option' );
(int) get_option( 'my_option' );
as we do now?

In practice using $value_type would make it possible to do something like:

add_option( 'my_option', true );
...
if ( true === get_option( 'my_option', false, 'bool' ) ) { 
   ...
}

When the type of an expression to test in if is known to be bool, the equation with true is redundant and the thing for me wil be not having to test for equality with neither true nor false.

From sligthly positive to this, I now in doubt. If the saved type is still to be guessed, is this PR then only to have a different way of writing code to get a typed value?

The main problem I have encountered is mostly determining the difference between

  • 'my option' not set/exists
  • add_option( 'my_option', '' );
  • add_option( 'my_option', false );

What now, if the, presumed bool, option does not exist? Do I still have to get it without conversion (string) and then for the casted bool value in case it exists?

#63 in reply to: ↑ 62 ; follow-up: @azaozz
18 months ago

Replying to knutsp:

What will be the difference to writing...

No much difference, just a bit cleaner way to write it.

If the saved type is still to be guessed...

It's not "to be guessed", rather it is to get a value with an expected (strict) type.

The main problem I have encountered is mostly determining the difference between

  • 'my option' not set/exists
  • add_option( 'my_option', '' );
  • add_option( 'my_option', false );

This touches on "best practices" when using the options API: "All options should be set (and preferably autoloaded)". Currently when you do add_option( 'my_option', false ); and then get_option( 'my_option' );, you get an empty string. That makes it possible to see if the option was set to false, but I don't think this is a "good behavior". Ideally whatever you save in the DB should come back the same. Returning strings for boolean and integer values is a long-standing inconsistency. Adding this param would make it easier to fix that, and it also reduces calls to maybe_unserialize() a little.

Last edited 18 months ago by azaozz (previous) (diff)

#64 @rjasdfiii
18 months ago

Those examples help.

Is your intent to apply the feature only to "Options"? Or maybe to wp_postmeta, too?

How about an application example, where the value is a price, or size, or number of bedrooms, or calendar date, or ...

And the value is used in an expression such as filtering with "where price between 200 and 2000".

#65 in reply to: ↑ 52 ; follow-up: @spacedmonkey
18 months ago

Replying to azaozz:

Replying to spacedmonkey:

Register meta existing before the REST API

Right, but there was a pretty major upgrade for it in WP 5.5 that changed a lot of stuff (and made some weird assumptions), see #43941.

That was my ticket, I worked on it months. What assumptions were weird there? Can you provide any detail?

The types supported are the following

'string' - string
'boolean' - boolean
'integer' - integer
'number' - float / double
'array' - array
'object' - object

Barring number, which IMO, is just a float value, all those types make sense to me. There is even help functions like rest_sanitize_value_from_schema and rest_validate_value_from_schema that will happen the validation and schema handling for you.

I am really against adding new parameter to the signatures of meta and options functions and believe we should explore register_meta and register_settings.

#66 in reply to: ↑ 65 @azaozz
18 months ago

Replying to spacedmonkey:

What assumptions were weird there?

Well, this is completely off-topic here but to me it seems pretty weird the way default meta values were implemented (sorry I was away at the time and couldn't review it). Pretty complex and soo much code, and so many repeated filters for a relatively simple concept. Because of that this functionality seems harder to use.

Also the assumptions about $single after the filters seem to be able to introduce edge cases/inconsistencies. The same place is buggy/inconsistent after the "pre" filters in get_metadata() but the behavior is slightly different.

Barring number, which IMO, is just a float value,

Yea, these look good although it seems it may have been better to use the PHP types in there and "convert" float to number in the REST API. Also a bit more docs about why one was chosen over the other and how to use/what to expect :)

I am really against adding new parameter to the signatures of meta and options functions and believe we should explore register_meta and register_settings.

As I mentioned above register_meta() and register_setting() are not suitable for the lower level meta and options APIs. They add specific requirements and expectations that do not make sense in many cases when core or a plugin wants to save and retrieve an option or some meta.

For example register_setting() is targeted at adding a setting or an "option" (checkbox, text field, etc.) to the wp-admin UI, not at saving an option in the options table. It was later adapted for use in the REST API by requiring a schema, etc. which doesn't make sense when adding a simple option to the DB.

#67 follow-ups: @flixos90
18 months ago

Chiming in here again after a long while, I am concerned with the current direction. It seems we are discussing whether to use get_option() or register_setting() to define the type for an option - FWIW, both of those solutions have problems:

  • If we put a new parameter on get_option() for the type, that's a clunky API: A single option may be retrieved in several places. Having to remember in every single function call to put the right $type value will inevitably lead to errors based on developers making mistakes.
  • Using register_setting() is more appropriate from that concern's perspective, however it also has several other side effects. Certainly, we should not use the existing type field of that function since that is heavily used already, and if that suddenly would define what get_option() returns, that would be a backward compatibility break.

In other words, neither of these approaches are suitable. I think we need a central function to control this (like register_setting()), but it also needs to be decoupled from any existing fields/information that already exists in core as it won't be the same.

Thinking out loud, that could either be a separate new function (something like set_option_database_type( $option, $type )), or we use a new database_type argument on register_setting(). Or we do both, with the function being the source of truth, and register_setting()'s new argument could optionally be used, passing the value through to that function.

#68 in reply to: ↑ 67 @azaozz
18 months ago

Replying to flixos90:

  • If we put a new parameter on get_option() for the type, that's a clunky API: A single option may be retrieved in several places. Having to remember in every single function call to put the right $type...

I tend to disagree. This doesn't sound like a problem. If you can remember (copy) the option name, you certainly can remember (copy) the type and eventually the default value too :)

  • Using register_setting() is more appropriate

Not sure about that. register_setting() is (was?) intended for use in the UI. Not for the low level options and meta.

I think we need a central function to control this

Hmmm, thinking that having to use two different pieces of code just to set an option or meta would probably be quite "clunky". I understand that now the REST API requires this, but not sure if that's the best way to do it especially for the low level data/interactions with the DB.

Last edited 18 months ago by azaozz (previous) (diff)

#69 in reply to: ↑ 67 ; follow-up: @spacedmonkey
18 months ago

Replying to flixos90:

  • Using register_setting() is more appropriate from that concern's perspective, however it also has several other side effects. Certainly, we should not use the existing type field of that function since that is heavily used already, and if that suddenly would define what get_option() returns, that would be a backward compatibility break.

If we were to use register_setting() and reuse the type, there would need to be an opt-in. Like a new parameter, like strict_typing or similar. If a developer opt-in, only then does it force the fixed typing.

I tend to disagree. This doesn't sound like a problem. If you can remember (copy) the option name, you certainly can remember (copy) the type and eventually the default value too :)

This imply that you are using your own option doesn't it. What if you are using a option from a plugin written by another developer. What if that developer changes the value from a string to a boolean when updating the plugin. No, if we are going to do this, types need to be defined in one place. Register settings is the logical place for that.

#70 in reply to: ↑ 69 ; follow-up: @azaozz
18 months ago

Replying to spacedmonkey:

If we were to use register_setting()

As I mentioned three times already, register_setting() is intended for use in the UI. That means it is for "settings" from HTML form elements. It doesn't make sense to use it for lower level APIs like options and meta.

This imply that you are using your own option doesn't it.

Not only. It is intended for use with code you trust, like your own code, core code, "respected" plugins that are known to follow WP best practices, etc. Would you even consider relaying on/reusing code from a plugin that is known to break back-compat?

What if you are using a option from a plugin written by another developer. What if that developer changes the value from a string to a boolean when updating the plugin.

Sure, lets explore this case:

  1. If you use a DB option added by another plugin, and the other plugin's author decides to change a DB option value type from string to boolean, and the type is "globally registered", your code may throw a fatal error. This is the same as changing the type of the return value from a function. Expecting one type of data by a third-party code and getting another signifies a badly written code. That's all.

However if the type of the option value is not "globally registered" but instead is passed when retrieving the value (like the current patch), your code will not throw a fatal error. You will get an empty string or a '1' if the third-party plugin author suddenly switched the option to a boolean.

  1. Then there is the case of the third-party plugin being deactivated, so the DB option is no longer "globally registered". What then? get_option() just reverts to the old behavior of returning string equivalents? More chances for fatal errors...

No, if we are going to do this, types need to be defined in one place. Register settings is the logical place for that.

No. That's wrong on three counts. See above :)

Last edited 18 months ago by azaozz (previous) (diff)

#71 in reply to: ↑ 70 ; follow-up: @flixos90
18 months ago

Replying to azaozz:

As I mentioned three times already, register_setting() is intended for use in the UI. That means it is for "settings" from HTML form elements.

I'm sorry but this is not correct. You're referring to what register_setting() was intended for use originally, but that hasn't been the case since 4.7, so at this point for longer than 6 years. The function is also responsible for various low-level aspects around options, like default values, or the type e.g. to describe the parameter in the REST API, neither of which is directly UI-related.

Again, I'm personally not saying we have to use register_setting() but it is still a valid consideration. But that's only the second step to decide.

Regardless of whether we use that function or another new function, what is your concern with handling the database type for an option centrally? I haven't seen any argument against that so far in this ticket.

#72 in reply to: ↑ 63 @knutsp
18 months ago

Replying to azaozz:

Replying to knutsp:

What will be the difference to writing...
What will be the difference to writing
(string) get_option( 'my_option' );
(bool) get_option( 'my_option' );
(int) get_option( 'my_option' );
as we do now?

No much difference, just a bit cleaner way to write it.

Cleaner that common, plain PHP, by adding an extra argument to make the function convert types?

If the saved type is still to be guessed...

It's not "to be guessed", rather it is to get a value with an expected (strict) type.

If not centrally registered, not in the database, it's guessing what the type was (intended to be) before saving.

Adding this param would make it easier to fix that, and it also reduces calls to maybe_unserialize() a little.

To convert an unkown type of data to fit in a (strictly) typed property is trivial, at least when working with supported PHP versions, as demonstrated above. To do it correctly, as the entity that saved the data in the first place, need the type to be centrally registered or saved along with the data in the database. To handle maybe serialized array/object without sniffing will require data with structural types that need serialization to be registered centrally or stored in database, as far as I can see.

A refined register_setting/register_meta or two new, simpler register_setting_type/register_meta_type that work in some logical connection with each other, could be a way forward.

#73 in reply to: ↑ 71 @azaozz
18 months ago

Replying to flixos90:

I'm sorry but this is not correct. You're referring to what register_setting() was intended for use originally, but that hasn't been the case since 4.7, so at this point for longer than 6 years. The function is also responsible for...

Yea, you're right. I think I understand where you're coming from. Unfortunately the "expansion/refactoring" in 4.7 made this a "try to do everything" type of function which is pretty bad design from software architecture point of view. So now it has some flaws and mixes up settings for HTML form elements with settings for the REST API and with settings for the retrieval of data from the database. Thinking it is a bad idea to continue to expand its scope.

what is your concern with handling the database type for an option centrally?

Because it is mostly useless to try to "handle" it outside of the database. The initial idea of this ticket was to store the type of the option/meta value in the database. I.e. it would have been impossible to change (unless hacking directly in the DB). The type would have been set by the data type of the last update_option(). This would have fixed some of the "weirdness" with the options and meta APIs.

Unfortunately it became clear that this is pretty much impossible to implement without breaking back-compat, and would introduce edge cases with caching. Trying to use external "register" comes with pretty much the same strings attached, and in addition there is a chance for it to become out of sync with the values in the DB. That would be a disaster :)

@knutsp

No much difference, just a bit cleaner way to write it.

Cleaner that common, plain PHP, by adding an extra argument to make the function convert types?

Yea, imho it is a bit cleaner. Also eliminates calls to maybe_unserialize() when the expected type is not array or object.

If not centrally registered, not in the database, it's guessing what the type was (intended to be) before saving.
...
To do it correctly, as the entity that saved the data in the first place,

Hmm, this is a good question. What is the expected functionality here, what do you call "correct"? Do you think it is a good idea for plugins to continue to type-cast the get_option() calls? From a plugin's point of view it doesn't matter what was the type when the value was saved in the DB, it matters what the type is when the plugin is going to use that value. So type-casting remains a quite viable solution regardless of any DB changes or registrations.

To handle maybe serialized array/object without sniffing will require data with structural types that need serialization to be registered centrally...

Nah, this doesn't seem like a good idea. Any separate or "central" registry (where the type is not saved in the same table row as the value) can get out of sync. For example a plugin that registers something gets deactivated.

A refined register_setting/register_meta or two new, simpler register_setting_type/register_meta_type that work in some logical connection with each other, could be a way forward.

Not sure about that. In addition to not being backwards compatible a "central registry" would not add any value for core or plugins that just need to save some data to the DB (option or meta). On the opposite, it may be a hindrance as once registered they will not be able to change the data type as that may break other plugins' expectations. In addition other plugins may break when the original plugin is deactivated as the "registered type" would go away.

Last edited 18 months ago by azaozz (previous) (diff)

#74 @azaozz
18 months ago

Thinking more about this. Part of the original idea here was to have an indicator in the DB of whether the data was serialized:

The most basic: 0 would be default and undefined, 1 would mean the data was not serialized when storing it, 2 would mean the data was serialized.

This, of course, will not fix the current "weirdness" with the options and meta APIs where you save true but get back (string) 1 (which is the same as when you save (int) 1). However it will eliminate calls to maybe_unserialize() for updated and newly added options and meta values. I'm personally not a big fan of this, but it is a viable solution too.

This would also open the door to consider JSON encoding all values except strings and class instances when storing them in the DB. That would "enforce" strict types because JSON encoding preserves the data type ($data = json_encode( true ) then json_decode( $data ) would return true). This will fix the above mentioned "weirdness".

There would be a tiny bit of increase in resources used, but it would probably be negligible as json_decode() for arrays seems faster than unserialize() and for int and bool is super fast. Haven't tested with production databases but expecting the faster decoding of arrays will compensate the need to decode boolean, integer and float values.

Last edited 18 months ago by azaozz (previous) (diff)

#75 @petitphp
18 months ago

This would also open the door to consider JSON encoding all values except strings and class >instances when storing them in the DB. That would "enforce" strict types because JSON encoding >preserves the data type ($data = json_encode( true ) then json_decode( $data ) would return >true). This will fix the above mentioned "weirdness".

I'm concerned encoding data in DB can limit how we can use it for search or in SQL queries (even if it's not a best practice).

#76 follow-up: @flixos90
18 months ago

@azaozz

Unfortunately the "expansion/refactoring" in 4.7 made this a "try to do everything" type of function which is pretty bad design from software architecture point of view. So now it has some flaws and mixes up settings for HTML form elements with settings for the REST API and with settings for the retrieval of data from the database. Thinking it is a bad idea to continue to expand its scope.

I see your point about mixing up concerns in that function not being optimal. Looking back, I would have preferred a separate function (e.g. register_option()?) to keep low-level option registration logic separate from UI registration logic in register_setting(). That said, the underlying functionality itself is crucial to have, and we cannot just stop working on it because we disagree with a decision made years ago. Either we have to do something about it (move to another new function and deprecate the changes introduced in 4.7?) or we have to accept where we are and move on. Simply not relying on the underlying functionality anymore because where it was implemented mixes option concerns with UI concerns is going to result in even more clunky APIs in the future.

Because it is mostly useless to try to "handle" it outside of the database. The initial idea of this ticket was to store the type of the option/meta value in the database. I.e. it would have been impossible to change (unless hacking directly in the DB). The type would have been set by the data type of the last update_option(). This would have fixed some of the "weirdness" with the options and meta APIs.

Unfortunately it became clear that this is pretty much impossible to implement without breaking back-compat, and would introduce edge cases with caching. Trying to use external "register" comes with pretty much the same strings attached, and in addition there is a chance for it to become out of sync with the values in the DB. That would be a disaster :)

The backward compatibility implication you mention makes sense, thanks for clarifying. Having a function return e.g. a (type-cast) int where it would have previously returned a raw string from the database is indeed a backward compatibility break, even though it would be nice to have. That makes me question the whole point of the current discussion, which is about database types. Originally, the focus was on avoiding maybe_unserialize(), and FWIW that is also the only thing here that has to do with performance.

I think we need to take a step back here. The approach you outlined in 1 makes a lot more sense conceptually than coming up with a full "type casting system" - that is tricky because of backward compatibility implications, and the approach of having to pass that in a parameter on every get_option() call is just as clunky of an API as plugins having to manually handle it. For example, get_option( $option, $default, 'int' ) is not really better or worse than (int) get_option( $option, $default ).

For purely serialization though, I think the backward compatibility implications don't apply. Data is typically serialized when it's an array or an object. Centrally registering for an option whether it uses serialization or not should not be the same concern regarding backward compatibility. For example, if a plugin today uses an option that conceptually is always a number, today the return value from get_option() may be an integer (if the default is returned or somehow filtered) or a string (if coming directly from the database). Telling WordPress explicitly that this option is not serialized will avoid the call to maybe_unserialize(), which addresses the original problem, and the return value will be maintained just like before.

I'm basically proposing that options can centrally opt out of being attempted to be unserialized when it's clear they'll never use serialized data in the first place.

#77 in reply to: ↑ 76 ; follow-up: @azaozz
18 months ago

Replying to flixos90:

Looking back, I would have preferred a separate function (e.g. register_option()?) to keep low-level option registration logic separate

Yes, same here.

That said, the underlying functionality itself is crucial to have, and we cannot just stop working on it

Yea, of course. As the options and meta APIs are some of the most used code in WP, thinking it'd be very good to revisit and improve these functions. New register_option() or maybe even register_option_type() or register_option_value_type() function makes sense imho.

Then the current problem with register_setting() where the type is filtered/changeable by third-party code can be fixed too. Also the possibility to unregister and re-register option and meta value types doesn't sound like a good idea. The registered types should work as if they were set in the DB imho, third-party code should not be able to interfere with them. (Sorry, getting a bit OT here.)

Originally, the focus was on avoiding maybe_unserialize(), and FWIW that is also the only thing here that has to do with performance.

The original idea was two-fold:

  • Add a way to get "strict" option and meta values. I.e. preserve the PHP type of the data when it was saved in the DB.
  • To reduce number of calls to maybe_unserialize() (which would have happened as a result of implementing strict types).

For example, get_option( $option, $default, 'int' ) is not really better or worse than (int) get_option( $option, $default ).

Yep, it's about the same. The only difference is that the first would skip the call to maybe_unserialize() as the return value will be an integer.

today the return value from get_option() may be an integer (if the default is returned or somehow filtered) or a string (if coming directly from the database). Telling WordPress explicitly that this option is not serialized will avoid the call to maybe_unserialize()...

I'm basically proposing that options can centrally opt out of being attempted to be unserialized when it's clear they'll never use serialized data in the first place.

Right, this will work too, perhaps with a minimal possibility of sync errors. However ideally the "serialized/not-serialized" bit of info should be stored on the same row as the option or meta value in the DB. In addition that would happen automatically when adding or updating options and meta, and will be completely transparent to the code that uses these APIs.

So thinking perhaps this can be split into two:

  • New/updated/improved way to register option and meta value types. Seems best to have new functions (that can be called from the existing) which would fix and improve the current behavior, will not use PHP globals, etc.
  • Perhaps reconsider the direction of this ticket (again) and repurpose it to add a DB table column that would only hold data indicating whether the option or meta value is serialized (i.e. option 1 in the first comment).
Last edited 18 months ago by azaozz (previous) (diff)

#78 in reply to: ↑ 77 ; follow-up: @knutsp
17 months ago

Replying to azaozz:

So thinking perhaps this can be split into two:

  • New/updated/improved way to register option and meta value types. Seems best to have new functions (that can be called from the existing) which would fix and improve the current behavior, will not use PHP globals, etc.
  • Perhaps reconsider the direction of this ticket (again) and repurpose it to add a DB table column that would only hold data indicating whether the option or meta value is serialized (i.e. option 1 in the first comment).

Interesting. Thinking this can be made the opposite way, and still avoid changing the table structure of options and all meta tables. When expiring transients are saved, an extra row containing the timeout is saved. Could we insert row '_option_type_{$option_name}' as the PHP type (repeated options has same type)?

The default, at least for now, is to set the actual type of $value, but if $type is illegal ('int' allowed) then do not save the type at all, alowing for a mixed $value (explicitly unknown type). Change
update_option( string $option, mixed $value, string|bool $autoload = null ): bool
to
update_option( string $option, mixed $value, string|bool|null $autoload = null, string|null $type = null ): bool

Then, if register_setting() gets an overhaul to accept PHP type, this will be the default when saving, but overrides the db value type for get_option(). Same logic for meta.
Then, only if db value type is not set/found (or unknown/illegal type), run maybe_unserialize(), elseif set to not scalar assume serialized and just unserialize(). This opens for (later) an explicit pseudo type 'json', even also 'serialized'. When scalar, convert to type (alias 'int' as 'integer').

  1. Reduce calls to maybe_unserialize() when type is explicitly a scalar type.
  2. No costly database table restructuring
  3. Saved typing by default, first phase PHP types only, second phase maybe allow/respect [ 'scalar', 'numeric', 'number', 'serialized', 'json', ...]
  4. Saved typing can be opted out for special cases or som kind of extended backwards compatibility.
  5. Should work with centralized registering of sanitizing, validation and UI/HTML handling, as saving with type this takes precedence on save, registered handling takes precedence on read/returning values.
  6. No new/alternative functions for centralized registering
  7. Future scope as 3. above, plus opens for even other serialization methods (CSV maybe?) and even compression.

#79 @flixos90
17 months ago

  • Keywords has-patch has-unit-tests removed

Removing these two keywords for now since any patch attached is still preliminary based on all the still ongoing discussion on the high-level approach.

Again, rather than continuing to discuss the low-level technical implementation, I suggest we take a step back here and think about what we want to achieve as part of this ticket. There are two concerns being mixed up which have different implications, including on backward compatibility.

  1. Avoiding calls to maybe_unserialize() is a performance related effort.
  2. Ensuring specific data types are returned from get_option() calls is not performance related and more about developer experience, e.g. to avoid having to type-cast every return value from that function.

I think those should be two different tickets which maybe are related, but I think the discussion here is unnecessarily complicating the 1. goal by combining it with the 2. goal, which wasn't the original point of this ticket as far as I can tell.

#80 @flixos90
17 months ago

  • Keywords 2nd-opinion added; needs-testing removed

#81 in reply to: ↑ 78 ; follow-up: @knutsp
17 months ago

Replying to knutsp:

  1. Reduce calls to maybe_unserialize() when type is explicitly a scalar type.
  2. No costly database table restructuring
  3. Saved typing by default, first phase PHP types only, second phase maybe allow/respect [ 'scalar', 'numeric', 'number', 'serialized', 'json', ...]
  4. Saved typing can be opted out for special cases or som kind of extended backwards compatibility.
  5. Should work with centralized registering of sanitizing, validation and UI/HTML handling, as saving with type this takes precedence on save, registered handling takes precedence on read/returning values.
  6. No new/alternative functions for centralized registering
  7. Future scope as 3. above, plus opens for even other serialization methods (CSV maybe?) and even compression.

I made a plugin to demonstrate. Se file functions.php for mye idea for the typing logic and saving. I also propose to base canonical type naming on get_debug_type() in PHP 8. This can be polyfilled.
https://github.com/knutsp/core-ticket-55942

#82 in reply to: ↑ 81 ; follow-up: @knutsp
17 months ago

Replying to knutsp:

I made a plugin to demonstrate. See file functions.php for mye idea for the typing logic and saving. I also propose to base canonical type naming on get_debug_type() in PHP 8. This can be polyfilled.
https://github.com/knutsp/core-ticket-55942

From the readme:
# add_option

Current signature: add_option( string $option, mixed $value = '', string $deprecated = '', string|bool $autoload = 'yes' ): bool

Proposed signature: add_option( string $option, mixed $value = '', string $deprecated = '', string|bool $autoload = 'yes', ?string $type = null ): bool

Alternatively: No change, use update_option() to save type?

# update_option

Current signature: update_option( string $option, mixed $value, string|bool $autoload = null ): bool

Proposed signature: update_option( string $option, mixed $value, string|bool $autoload = null, ?string $type = null ): bool

## Logic
If $type is given and not null, use as explicit $type.
else detect the type of $value, use as implicit $type.

Add or update the option value in db
If successful, add or update an extra option, with a prefixed $option as "_option_type_$option" with the value of $type.

# get_option
Current siganture unchanged: get_option( string $option, mixed $default_value = false ): mixed

## Logic
$type is fetched from a prefixed $option as "_option_type_$option"
After the $value is fetched, and the $type is found and valid, perform settype( $value, $type )before returning it

# add_metadata
(todo)

# add_meta
(todo)

# update_meta
(todo)

# update_{$object_type}_meta

Current signatures: update_{$object_type}_meta( int $term_id, string $meta_key, mixed $meta_value, mixed $prev_value = '' ): int|bool|WP_Error

Proposed sigantures: update_{$object_type}_meta( int $term_id, string $meta_key, mixed $meta_value, mixed $prev_value = '', ?string $type = null ): int|bool|WP_Error

## Logic
If $type is given and not null, use as explicit $type.
else detect the type of $value, use as implicit $type.

Update the meta value in db
If successful, add or update an extra meta_key, with a prefixed $meta_key as "_meta_type_$meta_key" with the value of $type.

# add_{$object_type}_meta

Current signatures: add_{$object_type}_meta( int $term_id, string $meta_key, mixed $meta_value, bool $unique = false ): int|false|WP_Error

Proposed signatures: add_{$object_type}_meta( int $term_id, string $meta_key, mixed $meta_value, bool $unique = false, ?string $type = null ): int|false|WP_Error

## Logic
If $type is given and not null, use as explicit $type.
else check if the $type already exists
else detect the type of $value, use as implicit $type.

Add the meta value in db
If successful and ( $type did not already exist or have expplicit $type given),
add or update an extra meta_key, with a prefixed $meta_key as "_meta_type_$meta_key" with the value of $type.

In other words: To change the stored type of a meta_key that already exists, explcit type must be used, implicit type will only be added if type did not exist


Todo:

  1. Detect where in current functions we can avoid maybe_[un_]serialize(), for performance
  2. Check the implications on db storage size and performace to update and get an extra option (when not cached)

If my proposed solution is not ruled out by the above, and the principle is interesting or acceptable:

  1. Do we need a way to have a way not to save type, like when installing default options?
  2. Keep add_option() as is, or change accordingly?
  3. Or should new, prefixed functions that calls the current ones, be created, like wp_update_option()?

#83 in reply to: ↑ 82 @knutsp
17 months ago

Replying to knutsp:

Todo:

  1. Detect where in current functions we can avoid maybe_[un_]serialize(), for performance
  2. Check the implications on db storage size and performace to update and get an extra option (when not cached)
  1. Serializing:
    • Serialize (on add/update) should happen for 'array' or 'object', when $type os not set, I guess as now.
    • Serialize (on add/update) should also happen is $type is set to 'arr[ay]' or 'obj[ect]', accoring to principle of implicit and explicit type.
    • All existing options/metadata, already serialized, should get 'array' or 'object' added as their type, accordingly, on this upgrade.
    • Unserialize (on get) should then only happen when saved type is 'array' or 'object'.
    • Before returning, and with a found type row, settype is performed.
  1. By treating 'string' as the default and only saving the (implicit/explicit) type when it's not the default, the number of extra table rows may be acceptable. This is now in the my concept plugin v2.0.

I will probably not be landing anything like this with a patch and tests for a long time. I've just started looking at the inner workings of the basic option and metadata functions to spot where this new logic may fit in. So if the principle gets accepted, help needed.

I observe that when saved options/meta are cached, the PHP type of the returned value is intact. Since this type confusion is already here, I guess backwards compatibility concerns may be small enough. WP_REST_Meta_Fields must not be affected.

#84 follow-up: @oglekler
16 months ago

  • Milestone changed from 6.3 to 6.4

This is an enhancement, and we are in 12 days until Beta 1 after which we will not add new enhancement to the release, so, because there is no patch, I am moving this ticket to 6.4.

#85 in reply to: ↑ 84 @azaozz
15 months ago

  • Milestone 6.4 deleted
  • Resolution set to maybelater
  • Status changed from new to closed

Replying to oglekler:

Thinking this can be revisited at a later time. The initial idea was a great enhancement but unfortunately it cannot be implemented with full back-compat. The "cut down" solution is not bad but is not a significant enhancement.

Thanks everybody for your work on this so far!

#86 @barry.hughes
15 months ago

Sounds like the right call. Thanks for steering the discussion so far, @azaozz!

@azaozz
15 months ago

Latest patch.

@azaozz commented on PR #4324:


15 months ago
#87

The trac ticket was closed.

Note: See TracTickets for help on using tickets.