WordPress.org

Make WordPress Core

Opened 2 years ago

Last modified 2 years ago

#40825 new enhancement

Re-addressing validation/sanitization of IDs to allow filtering before WP_Post (and others) database query

Reported by: LindsayBSC Owned by:
Milestone: Awaiting Review Priority: normal
Severity: normal Version: 4.7.5
Component: Posts, Post Types Keywords: dev-feedback
Focuses: template, performance Cc:
PR Number:

Description

The following ticket spawned from a desire to import content from outside of the WordPress database but have it treated as a native content type. A concept that has a clear audience that desires this functionality (see ticket: #12955) and has been addressed in a number of different ways. I believe the least impactful way to address this without removing the 'final' keyword from the WP_Post class that also improves standards for validation and sanitization of the ID value typically passed to get_posts() is to use the ID as a sort-of 'decorator' unto itself.

When merging the content from multiple sources to be displayed in a theme the biggest conflict to arise is duplication of IDs. Since the remote source is ignorant to the ID numbers already in use in the wp_posts table, a requirement for a "decorated" ID determined.

WordPress currently does not have a standard method for validating the format of the variable that will ultimately be passed to get_post() to create a new WP_Post object. The following methods are implemented in core files to attempt to sanitize or validate the value passed as an ID (*see links for pro/cons of usage in WP):

My Suggestion

I recommend creating a new function that will standardize the validation and sanitization of ID numbers that are being passed to a database query. All instances of is_numeric, (int), intval(), and absint() that are used as a way to validate or sanitize (or both validate and sanitize) an ID number that is passed to a query from an external function should be replaces with a new function that will serve both purposes. The new function will return a falsey response OR throw an exception when validation fails or if validation passes, will sanitize the value to a format compatible with the typical MYSQL type for the ID column (bigint).

Inside of this new function we can include a filter that will allow developers to override certain restrictions, specifically for allowing external content to be treated as a WP_Post object or some other native content that commonly would exist in WordPress’ database.

Since WP_Post will always look for a cached version of the object before querying the database, we make sure to store all necessary values in the cache before the template is loaded after we run our remote_get. We utilize the concatenated ID which is formatted like 12345-REMOTE as the ID in the cache so as to avoid conflicts w/ existing post IDs that are also stored in the memory cache. The only hurdle to this was the fact that core files were forcibly casting IDs as integers long before a query of any sort were to be made.

The argument for sanitizing early was to catch malformations early, but all it seemed to do was force the type early and never truly "caught" a bad value passed as an ID. A true “early catch” would either sanitize early w/ a falsey response or Exception and/or find the cached version as early as the sanitization so as to avoid the rest of the process of getting the WP_Post instance anyways.

Available in the following gist is my suggestion for a better validation function I called is_valid_id(), an example of how filters can be used on this sanitization function to allow external content to be treated as if it was a WP_Post object, and it's usage within a core file that previously used one of the subpar validation functions ( in this case meta.php using is_numeric() )

*comments in the file are just opinions and alternative thoughts I had while crafting this

https://gist.github.com/LinzardMac/38bbe22feb0b0a3fbabfcf64d797cd80

** It could be worthy of note that I have been using some version of this code in a live production site for the last 4 months without any changes needed to plugins or template files to account for this "non native" content.

Change History (6)

#1 @LindsayBSC
2 years ago

  • Keywords dev-feedback added

#2 @LindsayBSC
2 years ago

Wanted to drop a reference/link to this related ticket: #39053

A point was made that filter_var() is not required in a PHP build for WordPress to compile therefore I added a fallback for that case that uses preg_match() to look for a collection of digits that is not 0 and does not start with 0 (since an ID will/should never meet those conditions)

#3 follow-up: @peterwilsoncc
2 years ago

I've chatted a bit about this to @LindsayBSC in Slack.

A unified approach to object ID validation appeals to me a great deal. There has been a lot of discussion about integers, BIG INTs and PHP_INT_MAX recently that highlights the need for something consistent that works across the board.

I have hit situation where certain post types are hosted remotely with non-numeric IDs, so can see a use case for the filter. I have significant concerns about the security implications here as it would only take a small typo to create big problems. Core would need to defend against this. I'd like to see some thought into this defence.

tl;dr: +1 on a validation function, in two minds about including the filter.

#4 in reply to: ↑ 3 @LindsayBSC
2 years ago

I have certainly thought about this and here are my thoughts on the security issue w/ regards to the filter:

Since the purpose of filtering the ID number to allow for non-numeric ID numbers, we could actually create a wrapper function for the is_valid_id() function that is get_is_valid_id() which would be used only on 'read' requests for info and not any write requests. This means there would be no way to filter the ID value when using wp_insert_post() or update_post_meta() but for the purpose of displaying non-native content as if it were native content (ie: a post, user, term) we would still be able to filter the ID, making the filter much less dangerous and more like the_content() filter which is for output/display only.

See an example here:
You can see the wrapper function in get_is_valid_id.php and on line 7 and line 55 has the get_is_valid_id() and is_valid_id() respectively because line 7 happens on a 'read' function (get_post_meta()) and line 55 happens on a 'write' function (update_post_meta()).

https://gist.github.com/LinzardMac/ab4012d413b7f656c3dc7c80fc2f501f

I know this is a bit rough, but it illustrates the concept. It may be able to be made even more efficient.

*edited to update sample code & update line numbers referenced.

Replying to peterwilsoncc:

I've chatted a bit about this to @LindsayBSC in Slack.

A unified approach to object ID validation appeals to me a great deal. There has been a lot of discussion about integers, BIG INTs and PHP_INT_MAX recently that highlights the need for something consistent that works across the board.

I have hit situation where certain post types are hosted remotely with non-numeric IDs, so can see a use case for the filter. I have significant concerns about the security implications here as it would only take a small typo to create big problems. Core would need to defend against this. I'd like to see some thought into this defence.

tl;dr: +1 on a validation function, in two minds about including the filter.

Last edited 2 years ago by LindsayBSC (previous) (diff)

#5 @jbotte
2 years ago

Got my vote.

Love the idea of returning a falsey while adding consistency to ID format validation. Also, love, love, love the filter incorporation.

Last edited 2 years ago by jbotte (previous) (diff)

#6 @diddledan
2 years ago

I've been looking over this lately and it does a great job of allowing remote content to be used as if it were local. Having the overriding filter only operate when getting data and not on writing data looks to satisfy the security concerns of allowing arbitrary IDs to be sent to the database in my opinion.

+1 from me :-)

Note: See TracTickets for help on using tickets.